A FRAMEWORK OF EXTRACTING MULTI-SCALE FEATURES USING MULTIPLE CONVOLUTIONAL NEURAL NETWORKS. Kuan-Chuan Peng and Tsuhan Chen

Save this PDF as:

Size: px
Start display at page:

Download "A FRAMEWORK OF EXTRACTING MULTI-SCALE FEATURES USING MULTIPLE CONVOLUTIONAL NEURAL NETWORKS. Kuan-Chuan Peng and Tsuhan Chen"

Transcription

1 A FRAMEWORK OF EXTRACTING MULTI-SCALE FEATURES USING MULTIPLE CONVOLUTIONAL NEURAL NETWORKS Kuan-Chuan Peng and Tsuhan Chen School of Electrical and Computer Engineering, Cornell University, Ithaca, NY 14850, USA {kp388, ABSTRACT Most works related to convolutional neural networks (CNN) use the traditional CNN framework which extracts features in only one scale. We propose multi-scale convolutional neural networks (MSCNN) which can not only extract multi-scale features but also solve the issues of the previous methods which use CNN to extract multi-scale features. With the assumption of label-inheritable (LI) property, we also propose a method to generate exponentially more training examples for MSCNN from the given training set. Our experimental results show that MSCNN outperforms both the state-of-theart methods and the traditional CNN framework on artist, artistic style, and architectural style classification, supporting that MSCNN outperforms the traditional CNN framework on the tasks which at least partially satisfy LI property. Index Terms Convolutional neural networks, multiscale, label-inheritable 1. INTRODUCTION Convolutional neural networks (CNN) have received a great deal of focus in recent studies [1, 2, 3, 4, 5]. This trend started when Krizhevsky et al. [6] used CNN to achieve extraordinary performance in ImageNet [7] classification. Furthermore, Donahue et al. [8] show that the CNN used in Krizhevsky s work [6] can serve as a generic feature extractor, and that the extracted features outperform the state-of-the-art methods on standard benchmark datasets. Although recent researchers have shown breakthrough performance by using CNN, there are two constraints in the traditional CNN framework: 1: CNN extracts the features in only one scale without explicitly leveraging the features in different scales. 2: The performance of CNN highly depends on the amount of training data, which is shown in recent works [4, 8]. Using either Caltech-101 [9] or Caltech- 256 [10], these works [4, 8] modify the number of training examples per class and record the corresponding accuracy. Both works find that satisfactory performance can be achieved only when enough training examples are used. In this paper, we want to solve the above two issues of the traditional CNN framework. We propose multi-scale convolutional neural networks (MSCNN), a framework of extracting multi-scale features using multiple CNN. The details of MSCNN are explained in Sec. 3. We also propose a method to generate exponentially more training examples for MSCNN from the given training set based on the assumption that the task of interest is label-inheritable (LI). The concept of LI is explained in Sec. 2. In recent studies extracting multi-scale features using CNN, Gong et al. [2] propose multi-scale orderless pooling, but He et al. [3] propose spatial pyramid pooling. Both works show experimental results which support the argument that using multi-scale features outperforms using features in only one scale. In both works, the multi-scale features are extracted by using special pooling methods in the same CNN, so the CNN parameters used to extract multi-scale features are the same before the pooling stage. However, we argue that the features in different scales should be extracted with different sets of CNN parameters learned from the training examples in different scales, which is explicitly modeled in our proposed MSCNN. In the applications using multiple CNN, Lu et al. [11] predict pictorial aesthetics by using double-column convolutional neural network (DCNN) to extract features in two different scales, global view and fine-grained view. They also show that using DCNN outperforms using features in only one scale. The concept of using multiple CNN is similar in both DCNN and our proposed MSCNN. Nevertheless, the input images for the scale of fine-grained view in DCNN are generated by randomly cropping from the images in the scale of global view. Lu s approach [11] has two drawbacks: 1: The randomly cropped image may not well represent the entire image in the scale of fine-grained view. 2: By using only the cropped images in the scale of fine-grained view, the information which is not in the cropped area is ignored. In MSCNN, instead of throwing away information, we extract the features from every portion of the input image in each scale, which solves the above two drawbacks of DCNN. In this paper, we make the following contributions: We propose multi-scale convolutional neural networks (MSCNN) which can extract the features in different scales using multiple convolutional neural networks.

2 Fig. 1. The illustration of label-inheritable (LI) property. Given an image dataset D associated with a task T, if any cropped version of any image I from D can take the same label as that of I, we say that T satisfies LI property. In this figure, we only show the case when the cropped image is the upper right portion of the original image. In fact, the cropped image can be any portion of the original image. We show that MSCNN outperforms not only the stateof-the-art performance on three tasks (architectural style [12], artist, and artistic style classification [13]) but also the traditional CNN framework which only extracts features in one scale. We also propose a method to generate exponentially more training examples for MSCNN from the given training set. Under the label-inheritable (LI) assumption, our method generates exponentially more training examples without the need to collect new training examples explicitly. Although our method and MSCNN are designed under LI assumption, our experimental results support that our proposed framework can still outperform the traditional CNN framework on the tasks which only partially satisfy LI assumption. 2. LABEL-INHERITABLE PROPERTY Our proposed MSCNN is designed for the tasks satisfying label-inheritable property which is illustrated in Fig. 1. Given an image dataset D associated with a task T, if any cropped version of any image I from D can take the the same label as that of I, we say that T is label-inheritable (LI). In other words, if the concept represented by the label of any image I in the given dataset D is also represented in each portion of I, the corresponding task T satisfies LI property. Fig. 2 shows three examples from three different tasks which satisfy LI property in different degrees. We also list the corresponding dataset, task, and label under each example image. For any image from Painting-91 [13], each portion of that image is painted by the same artist (Picasso for the example image), so the task artist classification is LI. For the images from arcdataset [12], we can recognize the architectural style from different parts of the architecture (for example, the roof and pillars). However, if the cropped image does not contain any portion of the architecture, the cropped image cannot represent the architectural style of the Fig. 2. Example images from three different tasks which satisfy LI property in different degrees. The corresponding dataset, task, label, and the extent that LI property is satisfied are shown under each image. original image, which is the reason why LI property is only partially satisfied for architectural style classification. For the images from Caltech-101 [9], the LI property is satisfied when the cropped image contains the entire object which the label represents. If the cropped image contains only a portion of the object, the cropped image may not be able to take the same label. For the example image, the portion of the mouth cannot totally represent the label faces. Therefore, we think object classification is mostly not LI. In this work, we apply our proposed framework, MSCNN, to the tasks satisfying LI property in different degrees, showing that MSCNN outperforms the traditional CNN framework on the tasks satisfying LI property. We also show experimental results supporting that MSCNN can still outperform the traditional CNN framework on the tasks which only partially satisfy LI property. The details of the datasets and tasks used in this work are presented in Sec. 3.1, and the experimental results are shown in Sec Datasets and Tasks 3. EXPERIMENTAL SETUP Conducting experiment on 4 tasks from 3 datasets, we summarize all the datasets and tasks used in this work in Table 1, where their properties and related statistics are also shown. At the bottom of Table 1, we list the experimental setting associated with each task. In this paper, we refer to each task by its task ID listed in Table 1, where training/testing split indicates whether the training/testing splits are randomly generated or specified by the work proposing the dataset/task, and number of fold(s) shows that how many different training/testing splits are used for the task. To be consistent with prior works, we use the same experimental settings for these tasks as those used in the references listed at the bottom of Table 1.

3 Table 1. The tasks and associated datasets used in this work along with their properties. In this paper, we refer to each task by the corresponding task ID listed under each task. The experimental setting for each task is provided at the bottom of the table. dataset Painting-91 [13] Painting-91 [13] arcdataset [12] Caltech-101 [9] task artist artistic style architectural style object classification classification classification classification task ID ARTIST-CLS ART-CLS ARC-CLS CALTECH101 number of classes / number of images / image type painting painting architecture object label-inheritable yes mostly yes partial mostly no examples of class labels Rubens, Picasso. Baroque, Cubbism. Georgian, Gothic. chair, cup. number of training images / / 3030 number of testing images / / 2945 training/testing split specified [13] specified [13] random random number of fold(s) evaluation metric accuracy accuracy accuracy accuracy reference of the above setting [13] [13] [12] [4] 3.2. MSCNN Architecture The architecture of our proposed multi-scale convolutional neural networks (MSCNN) is illustrated in Fig. 3. MSCNN consists of m + 1 CNN which extract features in m + 1 different scales (from scale 0 to scale m, one CNN for each scale), and the structures of the m + 1 CNN are exactly the same. We only show m = 1 in Fig. 3 for clarity. In each scale, we use Krizhevsky s CNN [6] as the feature extractor which takes an image as input and outputs a 4096-d feature vector. The 4096-d feature vector is retrieved from the output of the topmost fully connected layer of Krizhevsky s CNN [6]. Given an input image, we extract its multi-scale features according to the following steps: 1. In scale i, we place a 2 i 2 i grid on the input image and generate 4 i cropped images. 2. We resize each of the 4 i cropped images in scale i to a pre-defined image size k k. 3. Each of the 4 i cropped resized images takes turn (in the order of left to right, top to bottom) to be the input of the CNN in scale i. In other words, the CNN in scale i extracts features 4 i times with one of the 4 i cropped resized images as input at each time. Afterward we generate the feature vector in scale i by cascading those 4 i 4096-d feature vectors in order. 4. The final multi-scale feature vector is generated by cascading all the feature vectors in all m + 1 scales in order, and the dimension of the multi-scale feature vector is [( 4 m+1 1 )/ 3 ] For the CNN in each scale, we use the Caffe [14] implementation of Krizhevsky s CNN [6] except that the number of the nodes in the output layer is set to the number of classes in each task listed in Table 1. When using the Caffe [14] implementation, we adopt its default training parameters for training the CNN for ImageNet [7] classification unless otherwise specified. The pre-defined image size k k is set to according to the Caffe [14] implementation. The details of the training approach and how to make prediction are explained in Sec Training Approach Before training, we generate a training set S i for each scale i from the original training set S of the task of interest T with the assumption that T satisfies LI property. We take the following steps: 1. We place a 2 i 2 i grid on each image of S, crop out 4 i sub-images, and resize each cropped image to k k. 2. Due to LI property, each cropped resized image is assigned the same label as that of the original image from which it is cropped. After the above two steps, the generated training set S i consists of 4 i S labeled images ( S is the number of images in S), and the size of each image is k k. We follow the Caffe [14] implementation, using k = 256. In training phase, adopting the Caffe reference model [14] (denoted as M ImageNet ) for ImageNet [7] classification, we train the CNN in scale i using the training set S i. We follow the descriptions and setting of supervised pre-training and fine-tuning used in Agrawal s work [1], where pre-training with M D means using a data-rich auxiliary dataset D to initialize the CNN parameters and fine-tuning means that all the CNN parameters can be updated by continued training on

4 Fig. 3. The illustration of multi-scale convolutional neural networks (MSCNN) which consists of m + 1 CNN (one for each of the m + 1 different scales). We use each CNN as a feature extractor in that scale. We only show m = 1 here for clarity. In scale i, we crop 4 i sub-images from the input image and resize each of them to The CNN in scale i takes each of the 4 i cropped resized images as input at each time and outputs a 4096-d vector for each input. The final multi-scale feature vector of the input image is generated by cascading all the 4096-d vectors in m + 1 scales. The details of the MSCNN architecture and the training approach are explained in Sec. 3.2 and Sec. 3.3 respectively. the corresponding training set. For the CNN in each scale i, we pre-train it with M ImageNet and fine-tune it with S i. After finishing training all m + 1 CNN in m + 1 scales, we extract the [( 4 m+1 1 )/ 3 ] 4096-d multi-scale feature vector of each training image in S using the method described in Sec With these multi-scale feature vectors for training, we use support vector machine (SVM) to train a linear classifier supervised by the labels of the images in S. In practice, we use LIBSVM [15] to do so with the cost (parameter C in SVM) set to the default value 1. Trying different C values, we find that different C values produce similar accuracy, so we just use the default value. [( In testing phase, given a testing image, we generate its 4 m+1 1 )/ 3 ] 4096-d multi-scale feature vector using the method described in Sec After that, we feed the feature vector of the testing image as the input of the trained SVM classifier, and the output of the SVM classifier is the predicted label of the testing image. Compared with the traditional CNN (m = 0), MSCNN has m + 1 times of the number of parameters to train. However, under LI assumption, we can generate exponentially more training examples (4 i S training examples in scale i) from the original training set S without the need to explicitly collect new training examples. For any two training examples I i and I j from S i and S j respectively (i < j), the overlapping content cannot exceed a quarter of the area of I i. Therefore, under LI assumption, our training approach not only generates exponentially more training examples but also guarantees certain diversity among the generated training examples. In terms of the training time, since S i contains 4 i S training examples, the time needed to train the CNN in scale i is 4 i t 0, where t 0 is the training time of the traditional CNN framework. The training time of the linear SVM classifier is far less than that of CNN, so it is negligible in the total training time. If we train the m + 1 CNN in m + 1 different scales in parallel, the total training time of MSCNN will be 4 m t 0. When training CNN, the traditional method implemented by Caffe [14] resizes each training image to a pre-defined resolution because of the hardware limitation of GPU. Due to the resizing, partial information in the original highresolution training image is lost. In MSCNN, more details in the original high-resolution training image are preserved in scale i (i > 0) because all the training examples in S i ( i {0, 1,, m}) are resized to the same resolution. The extra information preserved in scale i (i > 0) makes MSCNN outperform the traditional CNN framework on the tasks satisfying LI property, which is shown in Sec. 4. Our proposed MSCNN extracts the features in different scales with scale-dependent CNN parameters. In other words, the CNN parameters in scale i are specifically learned from the training images in scale i (the images in S i ). This training approach solves the issue of previous works [2, 3] which simply generate multi-scale features from the same set of CNN parameters that may not be the most suitable one for each scale. In addition, MSCNN extracts the features from every portion of the image in each scale, which solves the drawback of Lu s work [11] that does not fully utilize the information of the entire image in the scale of fine-grained view. Applying our proposed MSCNN to the tasks listed in Table 1, we report the performance in Sec. 4.

5 Fig. 4. The performance comparison for the task ARTIST- CLS. The results show that our proposed methods (MSCNN- 1 and MSCNN-2) outperform both Khan s method [13] and the traditional CNN framework (MSCNN-0), which supports that MSCNN outperforms the traditional CNN framework on the tasks satisfying LI property. Fig. 5. The performance comparison for the task ART-CLS. The results show that our proposed methods (MSCNN-1 to MSCNN-3) outperform both Khan s method [13] and the traditional CNN framework (MSCNN-0), which supports that MSCNN outperforms the traditional CNN framework on the tasks which are LI. 4. EXPERIMENTAL RESULTS Using the experimental setting specified in Table 1, we compare the performance of MSCNN with that of the traditional CNN framework (m = 0) and that of the state-of-theart methods. We use the notation MSCNN-m to represent the MSCNN framework consisting of m + 1 CNN in m + 1 different scales (scale 0 to scale m). MSCNN-0 represents the traditional CNN framework which extracts the features in only one scale. The results of each task are explained in the following paragraphs. ARTIST-CLS: Fig. 4 compares the performance of Khan s method [13], the traditional CNN framework (MSCNN-0) and our proposed methods (MSCNN-1 and MSCNN-2), showing that our proposed methods outperform both the current state-of-the-art method and the traditional CNN framework. Since ARTIST-CLS satisfies LI property, this result supports that MSCNN outperforms the traditional CNN framework on the tasks which are LI. Fig. 4 also shows that MSCNN-2 performs worse than MSCNN- 1, which indicates that increasing the number of scales in MSCNN may not always increase the accuracy. This is reasonable because the training images in scale i (the images in S i ) are less likely to contain useful features as i increases. Therefore, the accuracy increases the most when we increase the number of scales in MSCNN to a certain level. Although MSCNN-2 performs worse than MSCNN-1, both MSCNN-1 and MSCNN-2 outperform MSCNN-0, which supports that using multi-scale features is better than using the features in one scale for the tasks satisfying LI property. ART-CLS: Fig. 5 compares the performance of Khan s method [13] and MSCNN-0 to MSCNN-3, showing that our proposed methods (MSCNN-1 to MSCNN-3) outperform both the current state-of-the-art method and the traditional CNN framework (MSCNN-0). Because ART-CLS mostly Fig. 6. The performance comparison for the task CALTECH101. The results show that our proposed method (MSCNN-1) performs worse than the traditional CNN framework (MSCNN-0) because CALTECH101 is mostly not LI, which indicates that MSCNN does not work for the tasks which are not LI. satisfies LI property, this result supports that MSCNN outperforms the traditional CNN framework on the tasks satisfying LI property. Fig. 5 also supports the following two arguments: 1: The accuracy increases the most when we increase the number of scales in MSCNN to a certain level. 2: Using multi-scale features outperforms using the features in one scale for the tasks which are LI. CALTECH101: Fig. 6 compares the performance of Zeiler s work [4], MSCNN-0, and MSCNN-1, showing that Zeiler s work [4] and MSCNN-0 are comparable in accuracy. However, Fig. 6 shows that MSCNN-1 performs worse than MSCNN-0. This result is not surprising because CALTECH101 is mostly not LI and our proposed framework is based on LI assumption. Fig. 6 indicates that MSCNN does not work for the tasks which are not LI. ARC-CLS: Fig. 7 compares the performance of Xu s method [12] and MSCNN-0 to MSCNN-2, showing that our proposed methods (MSCNN-1 and MSCNN-2) outperform both the current state-of-the-art method and the traditional CNN framework (MSCNN-0). Since ARC-CLS

6 Fig. 7. The performance comparison for the task ARC- CLS. The results show that our proposed methods (MSCNN- 1 and MSCNN-2) outperform both Xu s method [12] and the traditional CNN framework (MSCNN-0), which supports that MSCNN still outperforms the traditional CNN framework on the tasks which are partially LI. only partially satisfies LI property, this result supports that MSCNN can still outperform the traditional CNN framework on the tasks which are only partially LI. Fig. 7 also supports that using multi-scale features outperforms using the features in only one scale for the tasks which are partially LI. Based on our experimental results, we show that MSCNN outperforms the traditional CNN framework on the tasks which are LI. Even when LI property is only partially satisfied, MSCNN can still outperform the traditional CNN framework. In addition, our results indicate that for a task satisfying LI property, there is a most suitable number of scales for MSCNN such that the accuracy can increase the most. 5. CONCLUSION We addressed the label-inheritable (LI) property of the classification tasks and proposed a novel framework, multi-scale convolutional neural networks (MSCNN), for the tasks satisfying LI property. We also proposed a training method for MSCNN under LI assumption such that exponentially more training examples can be generated without the need to collect new training data. MSCNN not only solves the issues of the previous methods which use CNN to extract multiscale features but also outperforms both the state-of-the-art methods and the traditional CNN framework on two LI tasks, artist and artistic style classification. Our experimental results also show that MSCNN can still outperform both the state-ofthe-art method and the traditional CNN framework even on architectural style classification which is only partially LI. 6. REFERENCES [1] P. Agrawal, R. Girshick, and J. Malik, Analyzing the performance of multilayer neural networks for object recognition, in ECCV, 2014, pp [2] Y. Gong, L. Wang, R. Guo, and S. Lazebnik, Multiscale orderless pooling of deep convolutional activation features, in ECCV, 2014, pp [3] K. He, X. Zhang, S. Ren, and J. Sun, Spatial pyramid pooling in deep convolutional networks for visual recognition, in ECCV, 2014, pp [4] M. D. Zeiler and R. Fergus, Visualizing and understanding convolutional networks, in ECCV, 2014, pp [5] N. Zhang, J. Donahue, R. Girshick, and T. Darrell, Part-based R-CNNs for fine-grained category detection, in ECCV, 2014, pp [6] A. Krizhevsky, I. Sutskever, and G. E. Hinton, ImageNet classification with deep convolutional neural networks, in Advances in Neural Information Processing Systems 25, pp [7] J. Deng, W. Dong, R. Socher, L.-J. Li, K. Li, and F.- F. Li, Imagenet: A large-scale hierarchical image database, in CVPR, 2009, pp [8] J. Donahue, Y. Jia, O. Vinyals, J. Hoffman, N. Zhang, E. Tzeng, and T. Darrell, DeCAF: A deep convolutional activation feature for generic visual recognition, CoRR, vol. abs/ , [9] F.-F. Li, R. Fergus, and P. Perona, Learning generative visual models from few training examples: An incremental bayesian approach tested on 101 object categories, in CVPRW, [10] G. Griffin, A. Holub, and P. Perona, Caltech-256 object category dataset, [11] X. Lu, Z. Lin, H. Jin, J. Yang, and J. Z. Wang, RAPID: rating pictorial aesthetics using deep learning, in ACMMM, [12] Z. Xu, D. Tao, Y. Zhang, J. Wu, and A. C. Tsoi, Architectural style classification using multinomial latent logistic regression, in ECCV, 2014, pp [13] F. S. Khan, S. Beigpour, J. V. D. Weijer, and M. Felsberg, Painting-91: a large scale database for computational painting categorization, Machine Vision and Applications, vol. 25, pp , [14] Y. Jia, E. Shelhamer, J. Donahue, S. Karayev, J. Long, R. Girshick, S. Guadarrama, and T. Darrell, Caffe: Convolutional architecture for fast feature embedding, arxiv preprint arxiv: , [15] C.-C. Chang and C.-J. Lin, LIBSVM: A library for support vector machines, ACM Transactions on Intelligent Systems and Technology, vol. 2, pp. 27:1 27:27, 2011, Software available at ntu.edu.tw/ cjlin/libsvm.

Part Localization by Exploiting Deep Convolutional Networks

Part Localization by Exploiting Deep Convolutional Networks Part Localization by Exploiting Deep Convolutional Networks Marcel Simon, Erik Rodner, and Joachim Denzler Computer Vision Group, Friedrich Schiller University of Jena, Germany www.inf-cv.uni-jena.de Abstract.

More information

arxiv: v1 [cs.cv] 6 Jul 2016

arxiv: v1 [cs.cv] 6 Jul 2016 arxiv:607.079v [cs.cv] 6 Jul 206 Deep CORAL: Correlation Alignment for Deep Domain Adaptation Baochen Sun and Kate Saenko University of Massachusetts Lowell, Boston University Abstract. Deep neural networks

More information

REGION AVERAGE POOLING FOR CONTEXT-AWARE OBJECT DETECTION

REGION AVERAGE POOLING FOR CONTEXT-AWARE OBJECT DETECTION REGION AVERAGE POOLING FOR CONTEXT-AWARE OBJECT DETECTION Kingsley Kuan 1, Gaurav Manek 1, Jie Lin 1, Yuan Fang 1, Vijay Chandrasekhar 1,2 Institute for Infocomm Research, A*STAR, Singapore 1 Nanyang Technological

More information

Robust Scene Classification with Cross-level LLC Coding on CNN Features

Robust Scene Classification with Cross-level LLC Coding on CNN Features Robust Scene Classification with Cross-level LLC Coding on CNN Features Zequn Jie 1, Shuicheng Yan 2 1 Keio-NUS CUTE Center, National University of Singapore, Singapore 2 Department of Electrical and Computer

More information

Deep Neural Networks:

Deep Neural Networks: Deep Neural Networks: Part II Convolutional Neural Network (CNN) Yuan-Kai Wang, 2016 Web site of this course: http://pattern-recognition.weebly.com source: CNN for ImageClassification, by S. Lazebnik,

More information

In Defense of Fully Connected Layers in Visual Representation Transfer

In Defense of Fully Connected Layers in Visual Representation Transfer In Defense of Fully Connected Layers in Visual Representation Transfer Chen-Lin Zhang, Jian-Hao Luo, Xiu-Shen Wei, Jianxin Wu National Key Laboratory for Novel Software Technology, Nanjing University,

More information

Comparison of Fine-tuning and Extension Strategies for Deep Convolutional Neural Networks

Comparison of Fine-tuning and Extension Strategies for Deep Convolutional Neural Networks Comparison of Fine-tuning and Extension Strategies for Deep Convolutional Neural Networks Nikiforos Pittaras 1, Foteini Markatopoulou 1,2, Vasileios Mezaris 1, and Ioannis Patras 2 1 Information Technologies

More information

Computer Vision Lecture 16

Computer Vision Lecture 16 Announcements Computer Vision Lecture 16 Deep Learning Applications 11.01.2017 Seminar registration period starts on Friday We will offer a lab course in the summer semester Deep Robot Learning Topic:

More information

Computer Vision Lecture 16

Computer Vision Lecture 16 Computer Vision Lecture 16 Deep Learning Applications 11.01.2017 Bastian Leibe RWTH Aachen http://www.vision.rwth-aachen.de leibe@vision.rwth-aachen.de Announcements Seminar registration period starts

More information

Channel Locality Block: A Variant of Squeeze-and-Excitation

Channel Locality Block: A Variant of Squeeze-and-Excitation Channel Locality Block: A Variant of Squeeze-and-Excitation 1 st Huayu Li Northern Arizona University Flagstaff, United State Northern Arizona University hl459@nau.edu arxiv:1901.01493v1 [cs.lg] 6 Jan

More information

Content-Based Image Recovery

Content-Based Image Recovery Content-Based Image Recovery Hong-Yu Zhou and Jianxin Wu National Key Laboratory for Novel Software Technology Nanjing University, China zhouhy@lamda.nju.edu.cn wujx2001@nju.edu.cn Abstract. We propose

More information

Real-time Object Detection CS 229 Course Project

Real-time Object Detection CS 229 Course Project Real-time Object Detection CS 229 Course Project Zibo Gong 1, Tianchang He 1, and Ziyi Yang 1 1 Department of Electrical Engineering, Stanford University December 17, 2016 Abstract Objection detection

More information

Volumetric and Multi-View CNNs for Object Classification on 3D Data Supplementary Material

Volumetric and Multi-View CNNs for Object Classification on 3D Data Supplementary Material Volumetric and Multi-View CNNs for Object Classification on 3D Data Supplementary Material Charles R. Qi Hao Su Matthias Nießner Angela Dai Mengyuan Yan Leonidas J. Guibas Stanford University 1. Details

More information

Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks

Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks Shaoqing Ren, Kaiming He, Ross Girshick, and Jian Sun Presented by Tushar Bansal Objective 1. Get bounding box for all objects

More information

Multi-Glance Attention Models For Image Classification

Multi-Glance Attention Models For Image Classification Multi-Glance Attention Models For Image Classification Chinmay Duvedi Stanford University Stanford, CA cduvedi@stanford.edu Pararth Shah Stanford University Stanford, CA pararth@stanford.edu Abstract We

More information

Deep learning for object detection. Slides from Svetlana Lazebnik and many others

Deep learning for object detection. Slides from Svetlana Lazebnik and many others Deep learning for object detection Slides from Svetlana Lazebnik and many others Recent developments in object detection 80% PASCAL VOC mean0average0precision0(map) 70% 60% 50% 40% 30% 20% 10% Before deep

More information

Kaggle Data Science Bowl 2017 Technical Report

Kaggle Data Science Bowl 2017 Technical Report Kaggle Data Science Bowl 2017 Technical Report qfpxfd Team May 11, 2017 1 Team Members Table 1: Team members Name E-Mail University Jia Ding dingjia@pku.edu.cn Peking University, Beijing, China Aoxue Li

More information

Proceedings of the International MultiConference of Engineers and Computer Scientists 2018 Vol I IMECS 2018, March 14-16, 2018, Hong Kong

Proceedings of the International MultiConference of Engineers and Computer Scientists 2018 Vol I IMECS 2018, March 14-16, 2018, Hong Kong , March 14-16, 2018, Hong Kong , March 14-16, 2018, Hong Kong , March 14-16, 2018, Hong Kong , March 14-16, 2018, Hong Kong TABLE I CLASSIFICATION ACCURACY OF DIFFERENT PRE-TRAINED MODELS ON THE TEST DATA

More information

The Treasure beneath Convolutional Layers: Cross-convolutional-layer Pooling for Image Classification

The Treasure beneath Convolutional Layers: Cross-convolutional-layer Pooling for Image Classification The Treasure beneath Convolutional Layers: Cross-convolutional-layer Pooling for Image Classification Lingqiao Liu 1, Chunhua Shen 1,2, Anton van den Hengel 1,2 The University of Adelaide, Australia 1

More information

Object Detection Based on Deep Learning

Object Detection Based on Deep Learning Object Detection Based on Deep Learning Yurii Pashchenko AI Ukraine 2016, Kharkiv, 2016 Image classification (mostly what you ve seen) http://tutorial.caffe.berkeleyvision.org/caffe-cvpr15-detection.pdf

More information

RoI Pooling Based Fast Multi-Domain Convolutional Neural Networks for Visual Tracking

RoI Pooling Based Fast Multi-Domain Convolutional Neural Networks for Visual Tracking Advances in Intelligent Systems Research, volume 133 2nd International Conference on Artificial Intelligence and Industrial Engineering (AIIE2016) RoI Pooling Based Fast Multi-Domain Convolutional Neural

More information

Automatic detection of books based on Faster R-CNN

Automatic detection of books based on Faster R-CNN Automatic detection of books based on Faster R-CNN Beibei Zhu, Xiaoyu Wu, Lei Yang, Yinghua Shen School of Information Engineering, Communication University of China Beijing, China e-mail: zhubeibei@cuc.edu.cn,

More information

Aggregating Descriptors with Local Gaussian Metrics

Aggregating Descriptors with Local Gaussian Metrics Aggregating Descriptors with Local Gaussian Metrics Hideki Nakayama Grad. School of Information Science and Technology The University of Tokyo Tokyo, JAPAN nakayama@ci.i.u-tokyo.ac.jp Abstract Recently,

More information

Recognize Complex Events from Static Images by Fusing Deep Channels Supplementary Materials

Recognize Complex Events from Static Images by Fusing Deep Channels Supplementary Materials Recognize Complex Events from Static Images by Fusing Deep Channels Supplementary Materials Yuanjun Xiong 1 Kai Zhu 1 Dahua Lin 1 Xiaoou Tang 1,2 1 Department of Information Engineering, The Chinese University

More information

Computer Vision Lecture 16

Computer Vision Lecture 16 Computer Vision Lecture 16 Deep Learning for Object Categorization 14.01.2016 Bastian Leibe RWTH Aachen http://www.vision.rwth-aachen.de leibe@vision.rwth-aachen.de Announcements Seminar registration period

More information

DeCAF: a Deep Convolutional Activation Feature for Generic Visual Recognition

DeCAF: a Deep Convolutional Activation Feature for Generic Visual Recognition DeCAF: a Deep Convolutional Activation Feature for Generic Visual Recognition ECS 289G 10/06/2016 Authors: Jeff Donahue, Yangqing Jia, Oriol Vinyals, Judy Hoffman, Ning Zhang, Eric Tzeng and Trevor Darrell

More information

Video Gesture Recognition with RGB-D-S Data Based on 3D Convolutional Networks

Video Gesture Recognition with RGB-D-S Data Based on 3D Convolutional Networks Video Gesture Recognition with RGB-D-S Data Based on 3D Convolutional Networks August 16, 2016 1 Team details Team name FLiXT Team leader name Yunan Li Team leader address, phone number and email address:

More information

IDENTIFYING PHOTOREALISTIC COMPUTER GRAPHICS USING CONVOLUTIONAL NEURAL NETWORKS

IDENTIFYING PHOTOREALISTIC COMPUTER GRAPHICS USING CONVOLUTIONAL NEURAL NETWORKS IDENTIFYING PHOTOREALISTIC COMPUTER GRAPHICS USING CONVOLUTIONAL NEURAL NETWORKS In-Jae Yu, Do-Guk Kim, Jin-Seok Park, Jong-Uk Hou, Sunghee Choi, and Heung-Kyu Lee Korea Advanced Institute of Science and

More information

YOLO9000: Better, Faster, Stronger

YOLO9000: Better, Faster, Stronger YOLO9000: Better, Faster, Stronger Date: January 24, 2018 Prepared by Haris Khan (University of Toronto) Haris Khan CSC2548: Machine Learning in Computer Vision 1 Overview 1. Motivation for one-shot object

More information

CEA LIST s participation to the Scalable Concept Image Annotation task of ImageCLEF 2015

CEA LIST s participation to the Scalable Concept Image Annotation task of ImageCLEF 2015 CEA LIST s participation to the Scalable Concept Image Annotation task of ImageCLEF 2015 Etienne Gadeski, Hervé Le Borgne, and Adrian Popescu CEA, LIST, Laboratory of Vision and Content Engineering, France

More information

An Exploration of Computer Vision Techniques for Bird Species Classification

An Exploration of Computer Vision Techniques for Bird Species Classification An Exploration of Computer Vision Techniques for Bird Species Classification Anne L. Alter, Karen M. Wang December 15, 2017 Abstract Bird classification, a fine-grained categorization task, is a complex

More information

TRANSPARENT OBJECT DETECTION USING REGIONS WITH CONVOLUTIONAL NEURAL NETWORK

TRANSPARENT OBJECT DETECTION USING REGIONS WITH CONVOLUTIONAL NEURAL NETWORK TRANSPARENT OBJECT DETECTION USING REGIONS WITH CONVOLUTIONAL NEURAL NETWORK 1 Po-Jen Lai ( 賴柏任 ), 2 Chiou-Shann Fuh ( 傅楸善 ) 1 Dept. of Electrical Engineering, National Taiwan University, Taiwan 2 Dept.

More information

arxiv: v1 [cs.cv] 31 Mar 2016

arxiv: v1 [cs.cv] 31 Mar 2016 Object Boundary Guided Semantic Segmentation Qin Huang, Chunyang Xia, Wenchao Zheng, Yuhang Song, Hao Xu and C.-C. Jay Kuo arxiv:1603.09742v1 [cs.cv] 31 Mar 2016 University of Southern California Abstract.

More information

Industrial Technology Research Institute, Hsinchu, Taiwan, R.O.C ǂ

Industrial Technology Research Institute, Hsinchu, Taiwan, R.O.C ǂ Stop Line Detection and Distance Measurement for Road Intersection based on Deep Learning Neural Network Guan-Ting Lin 1, Patrisia Sherryl Santoso *1, Che-Tsung Lin *ǂ, Chia-Chi Tsai and Jiun-In Guo National

More information

Deep Learning in Visual Recognition. Thanks Da Zhang for the slides

Deep Learning in Visual Recognition. Thanks Da Zhang for the slides Deep Learning in Visual Recognition Thanks Da Zhang for the slides Deep Learning is Everywhere 2 Roadmap Introduction Convolutional Neural Network Application Image Classification Object Detection Object

More information

Depth Estimation from a Single Image Using a Deep Neural Network Milestone Report

Depth Estimation from a Single Image Using a Deep Neural Network Milestone Report Figure 1: The architecture of the convolutional network. Input: a single view image; Output: a depth map. 3 Related Work In [4] they used depth maps of indoor scenes produced by a Microsoft Kinect to successfully

More information

Return of the Devil in the Details: Delving Deep into Convolutional Nets

Return of the Devil in the Details: Delving Deep into Convolutional Nets Return of the Devil in the Details: Delving Deep into Convolutional Nets Ken Chatfield - Karen Simonyan - Andrea Vedaldi - Andrew Zisserman University of Oxford The Devil is still in the Details 2011 2014

More information

Regionlet Object Detector with Hand-crafted and CNN Feature

Regionlet Object Detector with Hand-crafted and CNN Feature Regionlet Object Detector with Hand-crafted and CNN Feature Xiaoyu Wang Research Xiaoyu Wang Research Ming Yang Horizon Robotics Shenghuo Zhu Alibaba Group Yuanqing Lin Baidu Overview of this section Regionlet

More information

Efficient Convolutional Network Learning using Parametric Log based Dual-Tree Wavelet ScatterNet

Efficient Convolutional Network Learning using Parametric Log based Dual-Tree Wavelet ScatterNet Efficient Convolutional Network Learning using Parametric Log based Dual-Tree Wavelet ScatterNet Amarjot Singh, Nick Kingsbury Signal Processing Group, Department of Engineering, University of Cambridge,

More information

A Deep Learning Framework for Authorship Classification of Paintings

A Deep Learning Framework for Authorship Classification of Paintings A Deep Learning Framework for Authorship Classification of Paintings Kai-Lung Hua ( 花凱龍 ) Dept. of Computer Science and Information Engineering National Taiwan University of Science and Technology Taipei,

More information

Transfer Learning. Style Transfer in Deep Learning

Transfer Learning. Style Transfer in Deep Learning Transfer Learning & Style Transfer in Deep Learning 4-DEC-2016 Gal Barzilai, Ram Machlev Deep Learning Seminar School of Electrical Engineering Tel Aviv University Part 1: Transfer Learning in Deep Learning

More information

Tiny ImageNet Visual Recognition Challenge

Tiny ImageNet Visual Recognition Challenge Tiny ImageNet Visual Recognition Challenge Ya Le Department of Statistics Stanford University yle@stanford.edu Xuan Yang Department of Electrical Engineering Stanford University xuany@stanford.edu Abstract

More information

A HMAX with LLC for Visual Recognition

A HMAX with LLC for Visual Recognition A HMAX with LLC for Visual Recognition Kean Hong Lau, Yong Haur Tay, Fook Loong Lo Centre for Computing and Intelligent System Universiti Tunku Abdul Rahman, Kuala Lumpur, Malaysia {laukh,tayyh,lofl}@utar.edu.my

More information

Direct Multi-Scale Dual-Stream Network for Pedestrian Detection Sang-Il Jung and Ki-Sang Hong Image Information Processing Lab.

Direct Multi-Scale Dual-Stream Network for Pedestrian Detection Sang-Il Jung and Ki-Sang Hong Image Information Processing Lab. [ICIP 2017] Direct Multi-Scale Dual-Stream Network for Pedestrian Detection Sang-Il Jung and Ki-Sang Hong Image Information Processing Lab., POSTECH Pedestrian Detection Goal To draw bounding boxes that

More information

A Novel Representation and Pipeline for Object Detection

A Novel Representation and Pipeline for Object Detection A Novel Representation and Pipeline for Object Detection Vishakh Hegde Stanford University vishakh@stanford.edu Manik Dhar Stanford University dmanik@stanford.edu Abstract Object detection is an important

More information

Cascade Region Regression for Robust Object Detection

Cascade Region Regression for Robust Object Detection Large Scale Visual Recognition Challenge 2015 (ILSVRC2015) Cascade Region Regression for Robust Object Detection Jiankang Deng, Shaoli Huang, Jing Yang, Hui Shuai, Zhengbo Yu, Zongguang Lu, Qiang Ma, Yali

More information

arxiv: v1 [cs.cv] 29 Sep 2016

arxiv: v1 [cs.cv] 29 Sep 2016 arxiv:1609.09545v1 [cs.cv] 29 Sep 2016 Two-stage Convolutional Part Heatmap Regression for the 1st 3D Face Alignment in the Wild (3DFAW) Challenge Adrian Bulat and Georgios Tzimiropoulos Computer Vision

More information

Spatial Localization and Detection. Lecture 8-1

Spatial Localization and Detection. Lecture 8-1 Lecture 8: Spatial Localization and Detection Lecture 8-1 Administrative - Project Proposals were due on Saturday Homework 2 due Friday 2/5 Homework 1 grades out this week Midterm will be in-class on Wednesday

More information

Pedestrian Detection with Deep Convolutional Neural Network

Pedestrian Detection with Deep Convolutional Neural Network Pedestrian Detection with Deep Convolutional Neural Network Xiaogang Chen, Pengxu Wei, Wei Ke, Qixiang Ye, Jianbin Jiao School of Electronic,Electrical and Communication Engineering, University of Chinese

More information

Learning a Representative and Discriminative Part Model with Deep Convolutional Features for Scene Recognition

Learning a Representative and Discriminative Part Model with Deep Convolutional Features for Scene Recognition Learning a Representative and Discriminative Part Model with Deep Convolutional Features for Scene Recognition Bingyuan Liu, Jing Liu, Jingqiao Wang, Hanqing Lu Institute of Automation, Chinese Academy

More information

CS 2750: Machine Learning. Neural Networks. Prof. Adriana Kovashka University of Pittsburgh April 13, 2016

CS 2750: Machine Learning. Neural Networks. Prof. Adriana Kovashka University of Pittsburgh April 13, 2016 CS 2750: Machine Learning Neural Networks Prof. Adriana Kovashka University of Pittsburgh April 13, 2016 Plan for today Neural network definition and examples Training neural networks (backprop) Convolutional

More information

R-FCN++: Towards Accurate Region-Based Fully Convolutional Networks for Object Detection

R-FCN++: Towards Accurate Region-Based Fully Convolutional Networks for Object Detection The Thirty-Second AAAI Conference on Artificial Intelligence (AAAI-18) R-FCN++: Towards Accurate Region-Based Fully Convolutional Networks for Object Detection Zeming Li, 1 Yilun Chen, 2 Gang Yu, 2 Yangdong

More information

MULTI-SCALE OBJECT DETECTION WITH FEATURE FUSION AND REGION OBJECTNESS NETWORK. Wenjie Guan, YueXian Zou*, Xiaoqun Zhou

MULTI-SCALE OBJECT DETECTION WITH FEATURE FUSION AND REGION OBJECTNESS NETWORK. Wenjie Guan, YueXian Zou*, Xiaoqun Zhou MULTI-SCALE OBJECT DETECTION WITH FEATURE FUSION AND REGION OBJECTNESS NETWORK Wenjie Guan, YueXian Zou*, Xiaoqun Zhou ADSPLAB/Intelligent Lab, School of ECE, Peking University, Shenzhen,518055, China

More information

Cultural Event Recognition by Subregion Classification with Convolutional Neural Network

Cultural Event Recognition by Subregion Classification with Convolutional Neural Network Cultural Event Recognition by Subregion Classification with Convolutional Neural Network Sungheon Park and Nojun Kwak Graduate School of CST, Seoul National University Seoul, Korea {sungheonpark,nojunk}@snu.ac.kr

More information

Groupout: A Way to Regularize Deep Convolutional Neural Network

Groupout: A Way to Regularize Deep Convolutional Neural Network Groupout: A Way to Regularize Deep Convolutional Neural Network Eunbyung Park Department of Computer Science University of North Carolina at Chapel Hill eunbyung@cs.unc.edu Abstract Groupout is a new technique

More information

Unified, real-time object detection

Unified, real-time object detection Unified, real-time object detection Final Project Report, Group 02, 8 Nov 2016 Akshat Agarwal (13068), Siddharth Tanwar (13699) CS698N: Recent Advances in Computer Vision, Jul Nov 2016 Instructor: Gaurav

More information

Object detection with CNNs

Object detection with CNNs Object detection with CNNs 80% PASCAL VOC mean0average0precision0(map) 70% 60% 50% 40% 30% 20% 10% Before CNNs After CNNs 0% 2006 2007 2008 2009 2010 2011 2012 2013 2014 2015 2016 year Region proposals

More information

HENet: A Highly Efficient Convolutional Neural. Networks Optimized for Accuracy, Speed and Storage

HENet: A Highly Efficient Convolutional Neural. Networks Optimized for Accuracy, Speed and Storage HENet: A Highly Efficient Convolutional Neural Networks Optimized for Accuracy, Speed and Storage Qiuyu Zhu Shanghai University zhuqiuyu@staff.shu.edu.cn Ruixin Zhang Shanghai University chriszhang96@shu.edu.cn

More information

Real Time Monitoring of CCTV Camera Images Using Object Detectors and Scene Classification for Retail and Surveillance Applications

Real Time Monitoring of CCTV Camera Images Using Object Detectors and Scene Classification for Retail and Surveillance Applications Real Time Monitoring of CCTV Camera Images Using Object Detectors and Scene Classification for Retail and Surveillance Applications Anand Joshi CS229-Machine Learning, Computer Science, Stanford University,

More information

Yiqi Yan. May 10, 2017

Yiqi Yan. May 10, 2017 Yiqi Yan May 10, 2017 P a r t I F u n d a m e n t a l B a c k g r o u n d s Convolution Single Filter Multiple Filters 3 Convolution: case study, 2 filters 4 Convolution: receptive field receptive field

More information

Fast CNN-Based Object Tracking Using Localization Layers and Deep Features Interpolation

Fast CNN-Based Object Tracking Using Localization Layers and Deep Features Interpolation Fast CNN-Based Object Tracking Using Localization Layers and Deep Features Interpolation Al-Hussein A. El-Shafie Faculty of Engineering Cairo University Giza, Egypt elshafie_a@yahoo.com Mohamed Zaki Faculty

More information

Rich feature hierarchies for accurate object detection and semantic segmentation

Rich feature hierarchies for accurate object detection and semantic segmentation Rich feature hierarchies for accurate object detection and semantic segmentation Ross Girshick, Jeff Donahue, Trevor Darrell, Jitendra Malik Presented by Pandian Raju and Jialin Wu Last class SGD for Document

More information

Classifying a specific image region using convolutional nets with an ROI mask as input

Classifying a specific image region using convolutional nets with an ROI mask as input Classifying a specific image region using convolutional nets with an ROI mask as input 1 Sagi Eppel Abstract Convolutional neural nets (CNN) are the leading computer vision method for classifying images.

More information

arxiv: v1 [cs.cv] 5 Oct 2015

arxiv: v1 [cs.cv] 5 Oct 2015 Efficient Object Detection for High Resolution Images Yongxi Lu 1 and Tara Javidi 1 arxiv:1510.01257v1 [cs.cv] 5 Oct 2015 Abstract Efficient generation of high-quality object proposals is an essential

More information

Deep Tracking: Biologically Inspired Tracking with Deep Convolutional Networks

Deep Tracking: Biologically Inspired Tracking with Deep Convolutional Networks Deep Tracking: Biologically Inspired Tracking with Deep Convolutional Networks Si Chen The George Washington University sichen@gwmail.gwu.edu Meera Hahn Emory University mhahn7@emory.edu Mentor: Afshin

More information

Mimicking Very Efficient Network for Object Detection

Mimicking Very Efficient Network for Object Detection Mimicking Very Efficient Network for Object Detection Quanquan Li 1, Shengying Jin 2, Junjie Yan 1 1 SenseTime 2 Beihang University liquanquan@sensetime.com, jsychffy@gmail.com, yanjunjie@outlook.com Abstract

More information

Convolutional Neural Networks. Computer Vision Jia-Bin Huang, Virginia Tech

Convolutional Neural Networks. Computer Vision Jia-Bin Huang, Virginia Tech Convolutional Neural Networks Computer Vision Jia-Bin Huang, Virginia Tech Today s class Overview Convolutional Neural Network (CNN) Training CNN Understanding and Visualizing CNN Image Categorization:

More information

Rich feature hierarchies for accurate object detection and semantic segmentation

Rich feature hierarchies for accurate object detection and semantic segmentation Rich feature hierarchies for accurate object detection and semantic segmentation BY; ROSS GIRSHICK, JEFF DONAHUE, TREVOR DARRELL AND JITENDRA MALIK PRESENTER; MUHAMMAD OSAMA Object detection vs. classification

More information

arxiv: v1 [cs.cv] 4 Dec 2014

arxiv: v1 [cs.cv] 4 Dec 2014 Convolutional Neural Networks at Constrained Time Cost Kaiming He Jian Sun Microsoft Research {kahe,jiansun}@microsoft.com arxiv:1412.1710v1 [cs.cv] 4 Dec 2014 Abstract Though recent advanced convolutional

More information

arxiv: v1 [cs.cv] 15 Oct 2018

arxiv: v1 [cs.cv] 15 Oct 2018 Vehicle classification using ResNets, localisation and spatially-weighted pooling Rohan Watkins, Nick Pears, Suresh Manandhar Department of Computer Science, University of York, York, UK arxiv:1810.10329v1

More information

Deep Learning. Visualizing and Understanding Convolutional Networks. Christopher Funk. Pennsylvania State University.

Deep Learning. Visualizing and Understanding Convolutional Networks. Christopher Funk. Pennsylvania State University. Visualizing and Understanding Convolutional Networks Christopher Pennsylvania State University February 23, 2015 Some Slide Information taken from Pierre Sermanet (Google) presentation on and Computer

More information

Layerwise Interweaving Convolutional LSTM

Layerwise Interweaving Convolutional LSTM Layerwise Interweaving Convolutional LSTM Tiehang Duan and Sargur N. Srihari Department of Computer Science and Engineering The State University of New York at Buffalo Buffalo, NY 14260, United States

More information

DFT-based Transformation Invariant Pooling Layer for Visual Classification

DFT-based Transformation Invariant Pooling Layer for Visual Classification DFT-based Transformation Invariant Pooling Layer for Visual Classification Jongbin Ryu 1, Ming-Hsuan Yang 2, and Jongwoo Lim 1 1 Hanyang University 2 University of California, Merced Abstract. We propose

More information

arxiv: v2 [cs.cv] 23 May 2016

arxiv: v2 [cs.cv] 23 May 2016 Localizing by Describing: Attribute-Guided Attention Localization for Fine-Grained Recognition arxiv:1605.06217v2 [cs.cv] 23 May 2016 Xiao Liu Jiang Wang Shilei Wen Errui Ding Yuanqing Lin Baidu Research

More information

Traffic Multiple Target Detection on YOLOv2

Traffic Multiple Target Detection on YOLOv2 Traffic Multiple Target Detection on YOLOv2 Junhong Li, Huibin Ge, Ziyang Zhang, Weiqin Wang, Yi Yang Taiyuan University of Technology, Shanxi, 030600, China wangweiqin1609@link.tyut.edu.cn Abstract Background

More information

arxiv: v1 [cs.lg] 20 Dec 2013

arxiv: v1 [cs.lg] 20 Dec 2013 Unsupervised Feature Learning by Deep Sparse Coding Yunlong He Koray Kavukcuoglu Yun Wang Arthur Szlam Yanjun Qi arxiv:1312.5783v1 [cs.lg] 20 Dec 2013 Abstract In this paper, we propose a new unsupervised

More information

EFFECTIVE OBJECT DETECTION FROM TRAFFIC CAMERA VIDEOS. Honghui Shi, Zhichao Liu*, Yuchen Fan, Xinchao Wang, Thomas Huang

EFFECTIVE OBJECT DETECTION FROM TRAFFIC CAMERA VIDEOS. Honghui Shi, Zhichao Liu*, Yuchen Fan, Xinchao Wang, Thomas Huang EFFECTIVE OBJECT DETECTION FROM TRAFFIC CAMERA VIDEOS Honghui Shi, Zhichao Liu*, Yuchen Fan, Xinchao Wang, Thomas Huang Image Formation and Processing (IFP) Group, University of Illinois at Urbana-Champaign

More information

International Journal of Computer Engineering and Applications, Volume XII, Special Issue, September 18,

International Journal of Computer Engineering and Applications, Volume XII, Special Issue, September 18, REAL-TIME OBJECT DETECTION WITH CONVOLUTION NEURAL NETWORK USING KERAS Asmita Goswami [1], Lokesh Soni [2 ] Department of Information Technology [1] Jaipur Engineering College and Research Center Jaipur[2]

More information

arxiv: v1 [cs.mm] 12 Jan 2016

arxiv: v1 [cs.mm] 12 Jan 2016 Learning Subclass Representations for Visually-varied Image Classification Xinchao Li, Peng Xu, Yue Shi, Martha Larson, Alan Hanjalic Multimedia Information Retrieval Lab, Delft University of Technology

More information

Saliency-guided Selective Magnification for Company Logo Detection

Saliency-guided Selective Magnification for Company Logo Detection Saliency-guided Selective Magnification for Company Logo Detection Christian Eggert, Anton Winschel, Dan Zecha, Rainer Lienhart Multimedia Computing and Computer Vision Lab University of Augsburg Augsburg,

More information

CS89/189 Project Milestone: Differentiating CGI from Photographic Images

CS89/189 Project Milestone: Differentiating CGI from Photographic Images CS89/189 Project Milestone: Differentiating CGI from Photographic Images Shruti Agarwal and Liane Makatura February 18, 2016 1 Overview In response to the continuing improvements being made in the realm

More information

Object Detection Lecture Introduction to deep learning (CNN) Idar Dyrdal

Object Detection Lecture Introduction to deep learning (CNN) Idar Dyrdal Object Detection Lecture 10.3 - Introduction to deep learning (CNN) Idar Dyrdal Deep Learning Labels Computational models composed of multiple processing layers (non-linear transformations) Used to learn

More information

Feature-Fused SSD: Fast Detection for Small Objects

Feature-Fused SSD: Fast Detection for Small Objects Feature-Fused SSD: Fast Detection for Small Objects Guimei Cao, Xuemei Xie, Wenzhe Yang, Quan Liao, Guangming Shi, Jinjian Wu School of Electronic Engineering, Xidian University, China xmxie@mail.xidian.edu.cn

More information

Lecture 5: Object Detection

Lecture 5: Object Detection Object Detection CSED703R: Deep Learning for Visual Recognition (2017F) Lecture 5: Object Detection Bohyung Han Computer Vision Lab. bhhan@postech.ac.kr 2 Traditional Object Detection Algorithms Region-based

More information

Multiple VLAD encoding of CNNs for image classification

Multiple VLAD encoding of CNNs for image classification Multiple VLAD encoding of CNNs for image classification Qing Li, Qiang Peng, Chuan Yan 1 arxiv:1707.00058v1 [cs.cv] 30 Jun 2017 Abstract Despite the effectiveness of convolutional neural networks (CNNs)

More information

Handwritten Chinese Character Recognition by Joint Classification and Similarity Ranking

Handwritten Chinese Character Recognition by Joint Classification and Similarity Ranking 2016 15th International Conference on Frontiers in Handwriting Recognition Handwritten Chinese Character Recognition by Joint Classification and Similarity Ranking Cheng Cheng, Xu-Yao Zhang, Xiao-Hu Shao

More information

Real-Time Grasp Detection Using Convolutional Neural Networks

Real-Time Grasp Detection Using Convolutional Neural Networks Real-Time Grasp Detection Using Convolutional Neural Networks Joseph Redmon 1, Anelia Angelova 2 Abstract We present an accurate, real-time approach to robotic grasp detection based on convolutional neural

More information

Efficient Segmentation-Aided Text Detection For Intelligent Robots

Efficient Segmentation-Aided Text Detection For Intelligent Robots Efficient Segmentation-Aided Text Detection For Intelligent Robots Junting Zhang, Yuewei Na, Siyang Li, C.-C. Jay Kuo University of Southern California Outline Problem Definition and Motivation Related

More information

Real-time detection and tracking of pedestrians in CCTV images using a deep convolutional neural network

Real-time detection and tracking of pedestrians in CCTV images using a deep convolutional neural network Real-time detection and tracking of pedestrians in CCTV images using a deep convolutional neural network 1 Introduction Debaditya Acharya acharyad@studentunimelbeduau Stephan Winter winter@unimelbeduau

More information

arxiv: v3 [cs.cv] 29 Mar 2017

arxiv: v3 [cs.cv] 29 Mar 2017 EVOLVING BOXES FOR FAST VEHICLE DETECTION Li Wang 1,2, Yao Lu 3, Hong Wang 2, Yingbin Zheng 2, Hao Ye 2, Xiangyang Xue 1 arxiv:1702.00254v3 [cs.cv] 29 Mar 2017 1 Shanghai Key Lab of Intelligent Information

More information

SELF SUPERVISED DEEP REPRESENTATION LEARNING FOR FINE-GRAINED BODY PART RECOGNITION

SELF SUPERVISED DEEP REPRESENTATION LEARNING FOR FINE-GRAINED BODY PART RECOGNITION SELF SUPERVISED DEEP REPRESENTATION LEARNING FOR FINE-GRAINED BODY PART RECOGNITION Pengyue Zhang Fusheng Wang Yefeng Zheng Medical Imaging Technologies, Siemens Medical Solutions USA Inc., Princeton,

More information

Towards Real-Time Automatic Number Plate. Detection: Dots in the Search Space

Towards Real-Time Automatic Number Plate. Detection: Dots in the Search Space Towards Real-Time Automatic Number Plate Detection: Dots in the Search Space Chi Zhang Department of Computer Science and Technology, Zhejiang University wellyzhangc@zju.edu.cn Abstract Automatic Number

More information

HIERARCHICAL JOINT-GUIDED NETWORKS FOR SEMANTIC IMAGE SEGMENTATION

HIERARCHICAL JOINT-GUIDED NETWORKS FOR SEMANTIC IMAGE SEGMENTATION HIERARCHICAL JOINT-GUIDED NETWORKS FOR SEMANTIC IMAGE SEGMENTATION Chien-Yao Wang, Jyun-Hong Li, Seksan Mathulaprangsan, Chin-Chin Chiang, and Jia-Ching Wang Department of Computer Science and Information

More information

arxiv: v1 [cs.cv] 16 Nov 2015

arxiv: v1 [cs.cv] 16 Nov 2015 Coarse-to-fine Face Alignment with Multi-Scale Local Patch Regression Zhiao Huang hza@megvii.com Erjin Zhou zej@megvii.com Zhimin Cao czm@megvii.com arxiv:1511.04901v1 [cs.cv] 16 Nov 2015 Abstract Facial

More information

Tracking by recognition using neural network

Tracking by recognition using neural network Zhiliang Zeng, Ying Kin Yu, and Kin Hong Wong,"Tracking by recognition using neural network",19th IEEE/ACIS International Conference on Software Engineering, Artificial Intelligence, Networking and Parallel/Distributed

More information

Classification of objects from Video Data (Group 30)

Classification of objects from Video Data (Group 30) Classification of objects from Video Data (Group 30) Sheallika Singh 12665 Vibhuti Mahajan 12792 Aahitagni Mukherjee 12001 M Arvind 12385 1 Motivation Video surveillance has been employed for a long time

More information

Application of Convolutional Neural Network for Image Classification on Pascal VOC Challenge 2012 dataset

Application of Convolutional Neural Network for Image Classification on Pascal VOC Challenge 2012 dataset Application of Convolutional Neural Network for Image Classification on Pascal VOC Challenge 2012 dataset Suyash Shetty Manipal Institute of Technology suyash.shashikant@learner.manipal.edu Abstract In

More information

Mask R-CNN. presented by Jiageng Zhang, Jingyao Zhan, Yunhan Ma

Mask R-CNN. presented by Jiageng Zhang, Jingyao Zhan, Yunhan Ma Mask R-CNN presented by Jiageng Zhang, Jingyao Zhan, Yunhan Ma Mask R-CNN Background Related Work Architecture Experiment Mask R-CNN Background Related Work Architecture Experiment Background From left

More information

Study of Residual Networks for Image Recognition

Study of Residual Networks for Image Recognition Study of Residual Networks for Image Recognition Mohammad Sadegh Ebrahimi Stanford University sadegh@stanford.edu Hossein Karkeh Abadi Stanford University hosseink@stanford.edu Abstract Deep neural networks

More information

InterActive: Inter-Layer Activeness Propagation

InterActive: Inter-Layer Activeness Propagation 2016 IEEE Conference on Computer Vision and Pattern Recognition InterActive: Inter-Layer Activeness Propagation Lingxi Xie 1 Liang Zheng 2 Jingdong Wang 3 Alan Yuille 4 Qi Tian 5 1,4 Department of Statistics,

More information