Rare Chinese Character Recognition by Radical Extraction Network

Size: px
Start display at page:

Download "Rare Chinese Character Recognition by Radical Extraction Network"

Transcription

1 2017 IEEE International Conference on Systems, Man, and Cybernetics (SMC) Banff Center, Banff, Canada, October 5-8, 2017 Rare Chinese Character Recognition by Radical Extraction Network Ziang Yan, Chengzhe Yan, Changshui Zhang Department of Automation, Tsinghua University State Key Lab of Intelligent Technologies and Systems Tsinghua National Laboratory for Information Science and Technology (TNList) Beijing, P. R. China Abstract Building a modern Optical Character Recognition (OCR) system for Chinese is hard due to the large Chinese vocabulary list. Training images for rare Chinese characters are extremely expensive to obtain. Radical-based OCR systems tackle this problem by first extracting and recognizing basic graphical components (i.e., radicals) of a Chinese character. However, how to reliably recognize radicals still remains an open challenge. In this paper, we propose a novel Radical Extraction Network (REN) to extract and recognize radicals using deep Convolutional Neural Network (CNN). REN is end-to-end trainable, and it needs less hand-tunning compared with previous segmentation-based approaches. Deep appearance models for radicals are learned from data in a weakly supervised fashion, and no radical-level annotations are required. We learn to recognize different radicals on commonly used Chinese characters, and transfer the learned deep appearance models to rarely used Chinese characters. Experimental results show that the proposed method helps the classifier to recognize rare Chinese characters. Index Terms Radical-based Chinese OCR, Convolutional Neural Network, Weakly Supervised Learning I. INTRODUCTION Modern Optical Character Recognition (OCR) systems often use a feed-forward pipeline to detect and recognize text regions in an image. At the heart of an OCR system is a character-level image classifier with high accuracy. Chinese OCR is intricate because it involves a large number of characters and significant similarity between different characters [1]. There are about 50,000 different Chinese characters altogether, and only 6,000 of them are frequently used. Unfortunately, these rare characters often have great influence on the meaning of a sentence, especially in some scientific literatures (e.g., chemical literature). For rarely used Chinese characters, collecting enough training examples is costly, especially if a data-hungry deep Convolutional Neural Network (CNN) [2] is used. Chinese characters are formed by a combination of radicals (i.e., graphical component, or basic shape) [3]. Many Chinese characters are often visually similar, since they share the same radical. According to [4], the number of radicals is significantly fewer than the number of different Chinese characters. Thus, Chinese character recognition can be simplified by recognizing radicals and their relative positions. Radical-based Chinese character recognition approaches [3], [5] [9] recognize a Chinese character by first extracting and conv layers radical extractor.. global feature extractor radical score global feature Fig. 1. Radical-based Chinese character Recognition. An unseen Chinese character image is first processed by several convolutional layers to generate convolutional feature maps. Then, a radical extractor takes as input the feature maps, and recognizes individual radicals. A global feature extractor produces global feature maps based on the convolutional feature maps. Finally, we recognize the whole character by combining radical-level scores and the global feature maps. recognizing its radicals. The performance of radical recognition is crucial for a radical-based Chinese character recognition system. Despite the progress, learning to recognize radicals from Chinese character images still remains an challenging task. Recently, CNNs have boosted many computer vision fields at a dramatic pace, such as image classification [10], [11], object detection [12], [13], semantic segmentation [14]. CNNs have impressive ability to learn better hierarchical visual representations than traditional hand-crafted features such as SIFT [15], SURF [16], Haar-like features [17], and Fisher Vector [18], since all parameters in a CNN are learned from data. Recent studies on OCR [19], [20] show that CNN representations can also improve the performance of OCR systems. We use CNN to recognize radicals, since it can learn powerful representations from data. In this paper, we use CNN to learn robust appearance models of radicals. Our radical-based OCR system is shown in Figure 1. Aligned radical training images, which are often not readily available for practical Chinese OCR tasks, are time consuming and expensive to obtain for large Chinese character datasets. Compared with radical-level annotations, characterlevel annotations, indicating which character an image contains, are much easier to collect. Unlike traditional approaches which often need aligned radical-level training images to recognize different radicals [3], [21], we learn to localize /17/$ IEEE 924

2 ROI radical-level classification sm c ROI Pooling Layer rad x R radical-level detection sm d character-level classification cha conv global feature extraction glo radicals in a weakly supervised fashion: only character-level images are required in the training process. Because a radical can appear in many different positions at many different scales in a Chinese character, learning radicals from characters without any radical-level annotation is a highly challenging task. We address this task by incorporating recent progresses in the field of weakly supervised object detection (WSD) [22] [25]. These methods typically start from a set of candidate bounding boxes which may potentially contain objects tightly, and mine positive bounding boxes (i.e., bounding boxes that contain objects) from this set. Edge Boxes [26] and Selective Search [27] are often used to extract candidate bounding boxes from an image. WSDDN [22] by Bilen et al., which is a state-of-the-art WSD method, learns to select positive bounding boxes and classify different objects simultaneously. For radical recognition, we first extract a set of candidate bounding boxes, and then use REN to recognize radicals. We build REN upon WSDDN. REN has three data streams: 1) a radical-level classification stream to classify different radicals, 2) a radical-level detection stream to select positive candidate bounding box that tightly contain a particular radical, and 3) a character-level classification stream to classify different Chinese characters based on radical-level recognition results. The whole network is end-to-end trainable. In the training process, only character-level images are needed, and REN learns to extract and recognize different radicals from character-level annotations automatically. Since we need less annotation effort per image compared with traditional approaches [3], [21], REN scales better to larger datasets. Moreover, REN does not rely on radical templates or hand-crafted segmentation strategies, and all radical appearance models are learned from data. Experiments on rare Chinese characters show that REN can recognize radicals with a high accuracy, and improve the recognition performance on rare Chinese characters. Our main contributions are: We propose the Radical Extraction Network (REN), an end-to-end trainable deep convolutional neural network to extract and recognize radicals in a Chinese character image. REN does not need radical-level annotations of training images. We learn to decompose Chinese characters and discover radicals from only character-level annotations. Fig. 2. Architecture of Radical Extraction Network. 925 II. METHOD In this section, we introduce technical details of Radical Extraction Network (REN). The architecture of REN is shown in Figure 2. REN takes inputs as a Chinese character image and a set of candidate bounding boxes. REN has three streams to perform radical-level classification, radical-level detection, and character-level classification, respectively (Section II-B). REN is trained to localize radicals in an end-to-end fashion, under character-level supervision (Section II-C). A. Notation We have C different Chinese characters to recognize. Among C categories, C com categories are frequently used characters, and we have a larger number of training examples in these categories. The remaining C rare = C C com categories are rarely used characters, and we have a few training images in these categories. In our setting, an example is an image of a single Chinese character. Let x denote an image. We extract B bounding boxes from x, and denote the set of these bounding boxes by R. Let C rad denote the number of different radicals. We denote by b a bounding box in R, and denote by r a single radical in image x. We denote by φ the feature map generated by a particular layer. We denote by θ all the weights of REN, including the parameters of all the filters and bias. B. Radical Extraction Network Extracting radicals from Chinese characters under characterlevel supervision is essentially a weakly supervised learning problem. For each training image, we know which radicals it contains, but we do not know how these radicals look like or where they are. WSDDN by Bilen et al. [22] is a state-of-the-art weakly supervised object detection method, and it learns to localize and classify objects simultaneously under image-level (i.e., character-level in Chinese OCR task) supervision. Different from the object detection task which usually focus on object-level (i.e., radical-level in Chinese OCR task) accuracy, we focus on character-level accuracy in rare Chinese character recognition task. Thus, REN has one more stream than WSDDN to perform classification on character-level, as shown in Figure 2. REN is constructed from standard components in CNN: convolutional layers, nonlinear layers, fully connected layers

3 and dropout layers [10]. We insert a ReLU [10] nonlinear layer after each convolutional layer and fully connected layer. The inputs of REN consist of two parts: 1) a Chinese character image x, and 2) a set of bounding boxes R extracted from image x. We use Edge Boxes [26] to extract around B candidate bounding boxes for each image, as in WSDDN [22]. When B is large enough (e.g., B 200 in our experiments), we believe that for each radical r in character x, at least one bounding box b R contains r tightly. In REN, a character image x is first processed by several convolutional layers and a convolutional feature map φ conv (x; θ) is produced. Then, we branch off three data streams, described next. a) Radical-level classification data stream: The first and second data stream operate on radical-level. To achieve this, an ROI pooling layer [28] is inserted in the middle of REN, taking as input the convolutional feature map φ conv (x; θ) and the region set R, as shown in Figure 2. The ROI pooling layer performs pooling on each region, and it produces as output a matrix φ ROI (x, R; θ) R B droi, where d ROI is the dimension of pooled representation of each bounding box. The first data stream performs radical-level classification. In this data stream, the matrix φ ROI (x, R; θ) is processed by several fully connected layers, and each region is mapped to a C rad -dimensional vector. These fully connected layers output a score matrix φ c (x, R; θ) R B Crad, and a row-wise softmax operator is applied to it. The final output of this data stream φ sm c (x, R; θ) is given by exp [φ [φ sm c (x, R; θ)] ij c (x, R; θ)] ij = Crad k=1 exp [φ. (1) c(x, R; θ)] ik b) Radical-level detection data stream: The second data stream performs radical-level detection. As mentioned before, we learn to recognize radicals in a weakly supervised fashion: we do not know which bounding box contains a specific radical tightly. The aim of this data stream is to select a best bounding box for every radical. This stream starts from the pooled representation matrix φ ROI (x, R; θ). We map each region to a C rad -dimensional vector, by using several fully connected layers. These fully connected layers output a score matrix φ d (x, R; θ) R B Crad, and a column-wise softmax operator is applied to it. We do not share weights between these layers and the fully connected layers in the first data stream. The final output of this data stream φ sm d (x, R; θ) is given by exp [φ [φ sm d (x, R; θ)] ij d (x, R; θ)] ij = B k=1 exp [φ. (2) d(x, R; θ)] kj The radical score φ rad (x, R; θ) R Crad combining φ sm c (x, R; θ) and φ sm (x, R; θ) [ φ rad (x, R; θ) ] j = B k=1 d is obtained by [φ sm c (x, R; θ) φ sm d (x, R; θ)] kj, where is the element-wise product operator. Note that each element in φ rad is in the range of (0, 1). We interpret [φ rad ] j (3) as the confidence of the character x contains a j-th radical somewhere. c) Character-level classification data stream: The third data stream operates on character-level. The aim of this stream is to obtain the final character-level classification score. We classify a Chinese character image based on two kinds of information: 1) the character image itself, and 2) recognized radicals from this image. The character image can provide necessary global context, and the recognized radicals can capture substructure of the character image. In this data stream, we fuse these two kinds of information. This data stream starts from the convolutional feature map φ conv (x; θ), and maps it to a C glo -dimensional global context vector φ glo (x; θ) by using several fully connected layers. Then, we apply a linear map followed by a softmax operator φ cha = Softmax(W 1 φ glo + W 2 φ rad ), (4) where φ cha R C is the final character-level classification score, W 1, W 2 are weights to be learned, and W 1 φ glo + W 2 φ rad R C. C. Training REN In this section, we explain how to train the model. The training data is N Chinese character images {x 1,..., x N } with their character-level labels {y 1,..., y N }, where y i {1, 2,..., C}. We extract around B bounding boxes from x i using Edge Boxes, and denote the set of bounding boxes by R i. Moreover, we have a character-radical correspondence matrix T {0, 1} C Crad, indicating whether a character contains a particular radical. Note this matrix is irrelevant to the size of training set, and thus easy to obtain. From matrix T, we can construct a radical-level label yi rad {0, 1} Crad for image x i, indicating whether a particular radical is present in image x i. We do not need locations of radicals during the training. We have two goals: 1) recognize characters, and 2) recognize radicals. Thus we define two energy functions: and J cha (θ) = 1 N J rad (θ) = 1 N 1 N N C rad i=1 j=1 N 1{y rad i i=1 j=1 N C rad i=1 j=1 C 1{y i = j} log [ φ cha (x i, R i ; θ) ] j, 1{y rad i = 0} log = 1} log [ φ rad (x i, R i ; θ) ] j (5) ( 1 [ φ rad (x i, R i ; θ) ] ). (6) j The character-level loss J cha (θ) is a cross-entropy loss, and the radical-level loss J rad (θ) is a sum of C rad binary-logloss terms. We use stochastic gradient descent to optimize the following multi-task loss J(θ) = J cha (θ) + λ 1 J rad (θ) + λ 2 2 θ 2. (7) 926

4 III. EXPERIMENTS In this section, we evaluate REN on a challenging real world dataset. Experimental results show the proposed architecture provides significant performance improvement on rare Chinese character recognition task. A. Dataset Existing Chinese character datasets such as CASIA [29] usually consist of frequently used Chinese characters, thus, they are inappropriate for our task. In order to evaluate REN, we collect a rare Chinese character database called RCC. RCC is collected from Chinese Official Identity Card images and Chinese Official Invoice images. Example images in RCC are shown in Figure 3. RCC consists of C = 479 different characters, and 378,562 character images. Among 479 character categories, C com = 452 characters are frequently used, and C rare = 27 are rarely used. For each frequently used character category, we have images in average. For each rarely used character category, we have images in average. The 27 rare Chinese character categories consist of C rad = 10 different radicals in total. Names of these 10 radicals are listed in Table II. For each of the 479 characters, we manually labeled a 10-dimensional vector, indicating whether a specific radical is present in this character. We guarantee that for each of the 479 character categories, at least 1 of these 10 radicals is present. For frequently used characters, we use 60% images for training, 10% images for validation, and 30% images for test. For rarely used characters, we use 20% images for training, 10% images for validation, and 70% images for test. B. Evaluation metrics We evaluate both radical-level accuracy and character-level accuracy. For a character image x, we have a radical-level prediction score φ rad (x, R; θ) R Crad, and a character-level prediction score φ cha (x, R; θ) R C. We evaluate Average Precision (AP) on φ rad (x, R; θ) and φ cha (x, R; θ). We report VOC-style 11 points AP [30] in all experiments. We report average precision on the test set. C. Experimental setup d) Network architectures: Very deep neural networks such as VGG [31], GoogLeNet [32], or ResNet-101 [11] are very time consuming. Thus in this paper, we evaluate three shallow network architectures: a small model S, a medium model M, and a large model L. We set stride=1 for all convolutional layers. Detailed architectures of these three models are shown in Table I. We set C glo = 2C rad in all experiments. e) Training: We generate around 200 bounding boxes for each image. We set λ 1 = 0.1 in all experiments. For SGD, a momentum 0.9 and a weights decay of λ 2 = are used. A mini-batch is composed of 10 images, and we run SGD for 10 epochs. We set the learning to in the first 5 epochs, and then lower the learning rate to in the next 5 epochs. Our implementation is based on the publicly Fig. 3. Example images in RCC. available MATLAB implementation of WSDDN by Bilen et al. [22]. It takes about 25 hours to train a model L on a Pascal TITAN X GPU. D. Radical recognition results Radical-level recognition results on the test set are summarized in Table II. We evaluate different CNN architectures on both frequently used characters (com on Table II) and rarely used characters (rare on Table II), and report average for each radicals. The last column of Table II reports mean average precision (map) on 10 radicals. Our best model, REN-L, obtains 93.5% radical-level classification map on frequently used characters, and 90.5% map on rarely used characters. It can be seen from the quantitative results that larger CNN models provide better performance than smaller models. Moreover, Table II shows that the classification performance on frequently used characters and rarely used characters are roughly the same, indicating that our models have great generalization ability on unseen rarely used characters, even we have only a small number of training images. This can be explained by the fact that radicals are shared between frequently used characters and rarely used characters. Our model can learn accurate appearance models of radicals from frequently used characters. Figure 4 show the response maps in the radical detection. We predict accurate radical locations on both unseen frequently used and rarely used Chinese character images. Note that we do not use any templates or other prior knowledge about the shapes of radicals, and all appearance models are automatically learned from data. E. Character recognition results Character-level recognition results on the test set are summarized in Table III. Since the data is highly imbalanced, we use map to evaluate our models. In order to highlight the effectiveness of our three data streams architecture, we remove the radical-level classification data stream and the radicallevel detection data stream, and train end-to-end characterlevel classification CNN models (i.e., CNN- models in Table III). We evaluate average precision for each character category. As in Section III-D, we report map on both frequently used characters (com on Table III) and rarely used characters (rare on Table III). The last column of Table III reports map on all categories. 927

5 TABLE I CONVNET CONFIGURATIONS. model S model M model L Input conv conv conv max pool 2 2 max pool 2 2 max pool 2 2 conv conv conv max pool 2 2 max pool 2 2 max pool 2 2 Before branch conv conv conv ROI pool 5 5 conv max pool 2 2 ROI pool 6 6 conv conv ROI pool 6 6 radical cls stream fc 512 fc 1024 fc 2048 dropout 0.5 dropout 0.5 dropout 0.5 fc C rad fc C rad fc C rad radical det stream fc 512 fc 1024 fc 2048 dropout 0.5 dropout 0.5 dropout 0.5 fc C rad fc C rad fc C rad character cls stream dropout 0.5 dropout 0.5 dropout 0.5 fc C fc C fc C Our best model, REN-L, obtains 92.5% character-level classification map on rare characters, which is much better than CNN-L (65.1% map). Table III shows that REN models with three data streams always provide better performance on rarely used characters than character-level classification CNN models. On frequently used characters, REN models and CNN models achieve comparable performance. This indicates the proposed architecture can improve character-level recognition performance, which is of great importance to an OCR system. IV. RELATED WORK A. Radical-based Chinese character recognition Since classification on radicals is much easier than that on raw Chinese characters, many studies focus on radical extraction and recognition. Wang and Fan [9] propose a hierarchical matching approach for radical extraction. Chung and Ip [5] use a deformable model to decompose Chinese characters. Shi et al. [3] use Active Shape Models (ASMs) to extract radicals. Kernel PCA is used to learn the appearance models of radicals, and radical-level training images are required. Ni et al. [21] use cascade classifier [33] to detect radicals. Haar-like features and AdaBoost [34] are core components of their detectors. Many radical-based approaches use segmentation based methods, and important components of them are hand-crafted. Since many components are not learnable, they are usually not robust to noise. Several approaches [3], [21] achieve excellent performance by learning to recognize radicals from radical-level training images. Our method differs from existing radical extraction approaches in two aspects: 1) We do not rely on prior knowledge about the shapes of radicals, and all appearance models are learned from data. 2) We only need TABLE II COMPARISON OF RADICAL-LEVEL CLASSIFICATION AVERAGE PRECISION. radicals dataset model mu you ma yue dao ren qie tu cun xin map REN-S com REN-M REN-L REN-S rare REN-M REN-L TABLE III COMPARISON OF CHARACTER-LEVEL CLASSIFICATION MEAN AVERAGE PRECISION. model map com rare all CNN-S REN-S CNN-M REN-M CNN-L REN-L character-level annotations during training, which is much cheaper to collect than radical-level annotations. B. Weakly supervised object detection Weakly supervised object detection is an important problem in computer vision, since bounding box annotations are costly to obtain. Song et al. [25] propose a multiple instance learning framework to mine positive bounding boxes in a noisy collection of object proposals. Li et al. [23] propose a progressively domain adaption approach for bounding box mining. Bilen et al. [22] and Kantorov et al. [24] use an end-to-end trainable WSDDN to perform region selection and classification simultaneously. WSDDN have two data streams: classification stream (radical-level in REN) and detection (radical-level in REN) stream. We add a character-level classification data stream to WSDDN, since the final goal of REN is to classify different characters. V. CONCLUSION We present the Radical Extraction Network (REN) for rare Chinese character recognition. REN learns to recognize radicals under character-level supervision, and no prior knowledge about the shapes of radicals are needed during training. Experimental results show that the proposed method can recognize rare Chinese characters at a high accuracy. ACKNOWLEDGMENT This work is funded by NSFC (Grant No , No and NO ) and the German Research Foundation(DFG) in Project Crossmodal Learning, DFC TRR

6 (a) (b) (c) (d) (e) (f) (g) Fig. 4. Examples of detection response maps on radical tu. (a): A standard radical tu. Note this template is only used for visualization, not training. (b)(d)(f): Test images. (c)(e)(g): Visualization of φ sm d corresponding to test images. The radical extractor localizes the radical tu in test images. Best viewed in color. REFERENCES [1] F. Kimura, K. Takashina, S. Tsuruoka, and Y. Miyake, Modified quadratic discriminant functions and the application to chinese character recognition, IEEE Transactions on Pattern Analysis and Machine Intelligence, no. 1, pp , [2] Y. LeCun, L. Bottou, Y. Bengio, and P. Haffner, Gradient-based learning applied to document recognition, Proceedings of the IEEE, vol. 86, no. 11, pp , [3] D. Shi, S. R. Gunn, and R. I. Damper, Handwritten chinese radical recognition using nonlinear active shape models, IEEE transactions on pattern analysis and machine intelligence, vol. 25, no. 2, pp , [4] C.-W. Liao and J. S. Huang, A transformation invariant matching algorithm for handwritten chinese character recognition, Pattern Recognition, vol. 23, no. 11, pp , [5] F.-L. Chung and W. W. Ip, Complex character decomposition using deformable model, IEEE Transactions on Systems, Man, and Cybernetics, Part C (Applications and Reviews), vol. 31, no. 1, pp , [6] K. Chellapilla and P. Simard, A new radical based approach to offline handwritten east-asian character recognition, in Tenth International Workshop on Frontiers in Handwriting Recognition. Suvisoft, [7] L.-L. Ma and C.-L. Liu, A new radical-based approach to online handwritten chinese character recognition, in Pattern Recognition, ICPR th International Conference on. IEEE, 2008, pp [8] D. Shi, S. R. Gunn, and R. I. Damper, Handwritten chinese character recognition using nonlinear active shape models and the viterbi algorithm, Pattern Recognition Letters, vol. 23, no. 14, pp , [9] A.-B. Wang and K.-C. Fan, Optical recognition of handwritten chinese characters by hierarchical radical matching method, Pattern Recognition, vol. 34, no. 1, pp , [10] A. Krizhevsky, I. Sutskever, and G. E. Hinton, Imagenet classification with deep convolutional neural networks, in Advances in neural information processing systems, 2012, pp [11] K. He, X. Zhang, S. Ren, and J. Sun, Deep residual learning for image recognition, in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2016, pp [12] J. Redmon, S. Divvala, R. Girshick, and A. Farhadi, You only look once: Unified, real-time object detection, in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2016, pp [13] S. Ren, K. He, R. Girshick, and J. Sun, Faster r-cnn: Towards real-time object detection with region proposal networks, in Advances in neural information processing systems, 2015, pp [14] J. Long, E. Shelhamer, and T. Darrell, Fully convolutional networks for semantic segmentation, in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2015, pp [15] D. G. Lowe, Distinctive image features from scale-invariant keypoints, International journal of computer vision, vol. 60, no. 2, pp , [16] H. Bay, T. Tuytelaars, and L. Van Gool, Surf: Speeded up robust features, in European conference on computer vision. Springer, 2006, pp [17] P. Viola and M. J. Jones, Robust real-time face detection, International journal of computer vision, vol. 57, no. 2, pp , [18] J. Sánchez, F. Perronnin, T. Mensink, and J. Verbeek, Image classification with the fisher vector: Theory and practice, International journal of computer vision, vol. 105, no. 3, pp , [19] M. Jaderberg, K. Simonyan, A. Vedaldi, and A. Zisserman, Synthetic data and artificial neural networks for natural scene text recognition, arxiv preprint arxiv: , [20] R. Messina and J. Louradour, Segmentation-free handwritten chinese text recognition with lstm-rnn, in Document Analysis and Recognition (ICDAR), th International Conference on. IEEE, 2015, pp [21] E. Ni, M. Jiang, and C. Zhou, Radical extraction for handwritten chinese character recognition by using radical cascade classifier, in Electrical, Information Engineering and Mechatronics Springer, 2012, pp [22] H. Bilen and A. Vedaldi, Weakly supervised deep detection networks, in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2016, pp [23] D. Li, J.-B. Huang, Y. Li, S. Wang, and M.-H. Yang, Weakly supervised object localization with progressive domain adaptation, in IEEE Conference on Computer Vision and Pattern Recognition, [24] V. Kantorov, M. Oquab, M. Cho, and I. Laptev, Contextlocnet: Contextaware deep network models for weakly supervised localization, in European Conference on Computer Vision. Springer, 2016, pp [25] H. O. Song, R. B. Girshick, S. Jegelka, J. Mairal, Z. Harchaoui, T. Darrell et al., On learning to localize objects with minimal supervision. in ICML, 2014, pp [26] C. L. Zitnick and P. Dollár, Edge boxes: Locating object proposals from edges, in European Conference on Computer Vision. Springer, 2014, pp [27] J. R. Uijlings, K. E. Van De Sande, T. Gevers, and A. W. Smeulders, Selective search for object recognition, International journal of computer vision, vol. 104, no. 2, pp , [28] R. Girshick, Fast r-cnn, in Proceedings of the IEEE International Conference on Computer Vision, 2015, pp [29] C.-L. Liu, F. Yin, D.-H. Wang, and Q.-F. Wang, Casia online and offline chinese handwriting databases, in Document Analysis and Recognition (ICDAR), 2011 International Conference on. IEEE, 2011, pp [30] M. Everingham, L. Van Gool, C. K. I. Williams, J. Winn, and A. Zisserman, The PASCAL Visual Object Classes Challenge 2007 (VOC2007) Results, [31] K. Simonyan and A. Zisserman, Very deep convolutional networks for large-scale image recognition, arxiv preprint arxiv: , [32] C. Szegedy, W. Liu, Y. Jia, P. Sermanet, S. Reed, D. Anguelov, D. Erhan, V. Vanhoucke, and A. Rabinovich, Going deeper with convolutions, in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2015, pp [33] P. Viola and M. Jones, Rapid object detection using a boosted cascade of simple features, in Computer Vision and Pattern Recognition, CVPR Proceedings of the 2001 IEEE Computer Society Conference on, vol. 1. IEEE, 2001, pp. I I. [34] Y. Freund and R. E. Schapire, A desicion-theoretic generalization of on-line learning and an application to boosting, in European conference on computational learning theory. Springer, 1995, pp

Proceedings of the International MultiConference of Engineers and Computer Scientists 2018 Vol I IMECS 2018, March 14-16, 2018, Hong Kong

Proceedings of the International MultiConference of Engineers and Computer Scientists 2018 Vol I IMECS 2018, March 14-16, 2018, Hong Kong , March 14-16, 2018, Hong Kong , March 14-16, 2018, Hong Kong , March 14-16, 2018, Hong Kong , March 14-16, 2018, Hong Kong TABLE I CLASSIFICATION ACCURACY OF DIFFERENT PRE-TRAINED MODELS ON THE TEST DATA

More information

Channel Locality Block: A Variant of Squeeze-and-Excitation

Channel Locality Block: A Variant of Squeeze-and-Excitation Channel Locality Block: A Variant of Squeeze-and-Excitation 1 st Huayu Li Northern Arizona University Flagstaff, United State Northern Arizona University hl459@nau.edu arxiv:1901.01493v1 [cs.lg] 6 Jan

More information

REGION AVERAGE POOLING FOR CONTEXT-AWARE OBJECT DETECTION

REGION AVERAGE POOLING FOR CONTEXT-AWARE OBJECT DETECTION REGION AVERAGE POOLING FOR CONTEXT-AWARE OBJECT DETECTION Kingsley Kuan 1, Gaurav Manek 1, Jie Lin 1, Yuan Fang 1, Vijay Chandrasekhar 1,2 Institute for Infocomm Research, A*STAR, Singapore 1 Nanyang Technological

More information

Supplementary material for Analyzing Filters Toward Efficient ConvNet

Supplementary material for Analyzing Filters Toward Efficient ConvNet Supplementary material for Analyzing Filters Toward Efficient Net Takumi Kobayashi National Institute of Advanced Industrial Science and Technology, Japan takumi.kobayashi@aist.go.jp A. Orthonormal Steerable

More information

Object detection with CNNs

Object detection with CNNs Object detection with CNNs 80% PASCAL VOC mean0average0precision0(map) 70% 60% 50% 40% 30% 20% 10% Before CNNs After CNNs 0% 2006 2007 2008 2009 2010 2011 2012 2013 2014 2015 2016 year Region proposals

More information

Deep learning for object detection. Slides from Svetlana Lazebnik and many others

Deep learning for object detection. Slides from Svetlana Lazebnik and many others Deep learning for object detection Slides from Svetlana Lazebnik and many others Recent developments in object detection 80% PASCAL VOC mean0average0precision0(map) 70% 60% 50% 40% 30% 20% 10% Before deep

More information

R-FCN++: Towards Accurate Region-Based Fully Convolutional Networks for Object Detection

R-FCN++: Towards Accurate Region-Based Fully Convolutional Networks for Object Detection The Thirty-Second AAAI Conference on Artificial Intelligence (AAAI-18) R-FCN++: Towards Accurate Region-Based Fully Convolutional Networks for Object Detection Zeming Li, 1 Yilun Chen, 2 Gang Yu, 2 Yangdong

More information

Object Detection Based on Deep Learning

Object Detection Based on Deep Learning Object Detection Based on Deep Learning Yurii Pashchenko AI Ukraine 2016, Kharkiv, 2016 Image classification (mostly what you ve seen) http://tutorial.caffe.berkeleyvision.org/caffe-cvpr15-detection.pdf

More information

MULTI-SCALE OBJECT DETECTION WITH FEATURE FUSION AND REGION OBJECTNESS NETWORK. Wenjie Guan, YueXian Zou*, Xiaoqun Zhou

MULTI-SCALE OBJECT DETECTION WITH FEATURE FUSION AND REGION OBJECTNESS NETWORK. Wenjie Guan, YueXian Zou*, Xiaoqun Zhou MULTI-SCALE OBJECT DETECTION WITH FEATURE FUSION AND REGION OBJECTNESS NETWORK Wenjie Guan, YueXian Zou*, Xiaoqun Zhou ADSPLAB/Intelligent Lab, School of ECE, Peking University, Shenzhen,518055, China

More information

Convolutional Neural Networks. Computer Vision Jia-Bin Huang, Virginia Tech

Convolutional Neural Networks. Computer Vision Jia-Bin Huang, Virginia Tech Convolutional Neural Networks Computer Vision Jia-Bin Huang, Virginia Tech Today s class Overview Convolutional Neural Network (CNN) Training CNN Understanding and Visualizing CNN Image Categorization:

More information

Computer Vision Lecture 16

Computer Vision Lecture 16 Computer Vision Lecture 16 Deep Learning for Object Categorization 14.01.2016 Bastian Leibe RWTH Aachen http://www.vision.rwth-aachen.de leibe@vision.rwth-aachen.de Announcements Seminar registration period

More information

Content-Based Image Recovery

Content-Based Image Recovery Content-Based Image Recovery Hong-Yu Zhou and Jianxin Wu National Key Laboratory for Novel Software Technology Nanjing University, China zhouhy@lamda.nju.edu.cn wujx2001@nju.edu.cn Abstract. We propose

More information

Structured Prediction using Convolutional Neural Networks

Structured Prediction using Convolutional Neural Networks Overview Structured Prediction using Convolutional Neural Networks Bohyung Han bhhan@postech.ac.kr Computer Vision Lab. Convolutional Neural Networks (CNNs) Structured predictions for low level computer

More information

Feature-Fused SSD: Fast Detection for Small Objects

Feature-Fused SSD: Fast Detection for Small Objects Feature-Fused SSD: Fast Detection for Small Objects Guimei Cao, Xuemei Xie, Wenzhe Yang, Quan Liao, Guangming Shi, Jinjian Wu School of Electronic Engineering, Xidian University, China xmxie@mail.xidian.edu.cn

More information

PT-NET: IMPROVE OBJECT AND FACE DETECTION VIA A PRE-TRAINED CNN MODEL

PT-NET: IMPROVE OBJECT AND FACE DETECTION VIA A PRE-TRAINED CNN MODEL PT-NET: IMPROVE OBJECT AND FACE DETECTION VIA A PRE-TRAINED CNN MODEL Yingxin Lou 1, Guangtao Fu 2, Zhuqing Jiang 1, Aidong Men 1, and Yun Zhou 2 1 Beijing University of Posts and Telecommunications, Beijing,

More information

Real-time Object Detection CS 229 Course Project

Real-time Object Detection CS 229 Course Project Real-time Object Detection CS 229 Course Project Zibo Gong 1, Tianchang He 1, and Ziyi Yang 1 1 Department of Electrical Engineering, Stanford University December 17, 2016 Abstract Objection detection

More information

Towards Real-Time Automatic Number Plate. Detection: Dots in the Search Space

Towards Real-Time Automatic Number Plate. Detection: Dots in the Search Space Towards Real-Time Automatic Number Plate Detection: Dots in the Search Space Chi Zhang Department of Computer Science and Technology, Zhejiang University wellyzhangc@zju.edu.cn Abstract Automatic Number

More information

Unified, real-time object detection

Unified, real-time object detection Unified, real-time object detection Final Project Report, Group 02, 8 Nov 2016 Akshat Agarwal (13068), Siddharth Tanwar (13699) CS698N: Recent Advances in Computer Vision, Jul Nov 2016 Instructor: Gaurav

More information

Elastic Neural Networks for Classification

Elastic Neural Networks for Classification Elastic Neural Networks for Classification Yi Zhou 1, Yue Bai 1, Shuvra S. Bhattacharyya 1, 2 and Heikki Huttunen 1 1 Tampere University of Technology, Finland, 2 University of Maryland, USA arxiv:1810.00589v3

More information

arxiv: v1 [cs.cv] 20 Dec 2016

arxiv: v1 [cs.cv] 20 Dec 2016 End-to-End Pedestrian Collision Warning System based on a Convolutional Neural Network with Semantic Segmentation arxiv:1612.06558v1 [cs.cv] 20 Dec 2016 Heechul Jung heechul@dgist.ac.kr Min-Kook Choi mkchoi@dgist.ac.kr

More information

Traffic Multiple Target Detection on YOLOv2

Traffic Multiple Target Detection on YOLOv2 Traffic Multiple Target Detection on YOLOv2 Junhong Li, Huibin Ge, Ziyang Zhang, Weiqin Wang, Yi Yang Taiyuan University of Technology, Shanxi, 030600, China wangweiqin1609@link.tyut.edu.cn Abstract Background

More information

Deep Neural Networks:

Deep Neural Networks: Deep Neural Networks: Part II Convolutional Neural Network (CNN) Yuan-Kai Wang, 2016 Web site of this course: http://pattern-recognition.weebly.com source: CNN for ImageClassification, by S. Lazebnik,

More information

Convolution Neural Networks for Chinese Handwriting Recognition

Convolution Neural Networks for Chinese Handwriting Recognition Convolution Neural Networks for Chinese Handwriting Recognition Xu Chen Stanford University 450 Serra Mall, Stanford, CA 94305 xchen91@stanford.edu Abstract Convolutional neural networks have been proven

More information

Lecture 5: Object Detection

Lecture 5: Object Detection Object Detection CSED703R: Deep Learning for Visual Recognition (2017F) Lecture 5: Object Detection Bohyung Han Computer Vision Lab. bhhan@postech.ac.kr 2 Traditional Object Detection Algorithms Region-based

More information

Study of Residual Networks for Image Recognition

Study of Residual Networks for Image Recognition Study of Residual Networks for Image Recognition Mohammad Sadegh Ebrahimi Stanford University sadegh@stanford.edu Hossein Karkeh Abadi Stanford University hosseink@stanford.edu Abstract Deep neural networks

More information

Direct Multi-Scale Dual-Stream Network for Pedestrian Detection Sang-Il Jung and Ki-Sang Hong Image Information Processing Lab.

Direct Multi-Scale Dual-Stream Network for Pedestrian Detection Sang-Il Jung and Ki-Sang Hong Image Information Processing Lab. [ICIP 2017] Direct Multi-Scale Dual-Stream Network for Pedestrian Detection Sang-Il Jung and Ki-Sang Hong Image Information Processing Lab., POSTECH Pedestrian Detection Goal To draw bounding boxes that

More information

Deep Learning in Visual Recognition. Thanks Da Zhang for the slides

Deep Learning in Visual Recognition. Thanks Da Zhang for the slides Deep Learning in Visual Recognition Thanks Da Zhang for the slides Deep Learning is Everywhere 2 Roadmap Introduction Convolutional Neural Network Application Image Classification Object Detection Object

More information

Computer Vision Lecture 16

Computer Vision Lecture 16 Computer Vision Lecture 16 Deep Learning Applications 11.01.2017 Bastian Leibe RWTH Aachen http://www.vision.rwth-aachen.de leibe@vision.rwth-aachen.de Announcements Seminar registration period starts

More information

arxiv: v1 [cs.cv] 26 Jun 2017

arxiv: v1 [cs.cv] 26 Jun 2017 Detecting Small Signs from Large Images arxiv:1706.08574v1 [cs.cv] 26 Jun 2017 Zibo Meng, Xiaochuan Fan, Xin Chen, Min Chen and Yan Tong Computer Science and Engineering University of South Carolina, Columbia,

More information

Spatial Localization and Detection. Lecture 8-1

Spatial Localization and Detection. Lecture 8-1 Lecture 8: Spatial Localization and Detection Lecture 8-1 Administrative - Project Proposals were due on Saturday Homework 2 due Friday 2/5 Homework 1 grades out this week Midterm will be in-class on Wednesday

More information

FCHD: A fast and accurate head detector

FCHD: A fast and accurate head detector JOURNAL OF L A TEX CLASS FILES, VOL. 14, NO. 8, AUGUST 2015 1 FCHD: A fast and accurate head detector Aditya Vora, Johnson Controls Inc. arxiv:1809.08766v2 [cs.cv] 26 Sep 2018 Abstract In this paper, we

More information

3D model classification using convolutional neural network

3D model classification using convolutional neural network 3D model classification using convolutional neural network JunYoung Gwak Stanford jgwak@cs.stanford.edu Abstract Our goal is to classify 3D models directly using convolutional neural network. Most of existing

More information

Computer Vision Lecture 16

Computer Vision Lecture 16 Announcements Computer Vision Lecture 16 Deep Learning Applications 11.01.2017 Seminar registration period starts on Friday We will offer a lab course in the summer semester Deep Robot Learning Topic:

More information

Deconvolutions in Convolutional Neural Networks

Deconvolutions in Convolutional Neural Networks Overview Deconvolutions in Convolutional Neural Networks Bohyung Han bhhan@postech.ac.kr Computer Vision Lab. Convolutional Neural Networks (CNNs) Deconvolutions in CNNs Applications Network visualization

More information

arxiv: v1 [cs.cv] 31 Mar 2017

arxiv: v1 [cs.cv] 31 Mar 2017 End-to-End Spatial Transform Face Detection and Recognition Liying Chi Zhejiang University charrin0531@gmail.com Hongxin Zhang Zhejiang University zhx@cad.zju.edu.cn Mingxiu Chen Rokid.inc cmxnono@rokid.com

More information

YOLO9000: Better, Faster, Stronger

YOLO9000: Better, Faster, Stronger YOLO9000: Better, Faster, Stronger Date: January 24, 2018 Prepared by Haris Khan (University of Toronto) Haris Khan CSC2548: Machine Learning in Computer Vision 1 Overview 1. Motivation for one-shot object

More information

Learning Spatio-Temporal Features with 3D Residual Networks for Action Recognition

Learning Spatio-Temporal Features with 3D Residual Networks for Action Recognition Learning Spatio-Temporal Features with 3D Residual Networks for Action Recognition Kensho Hara, Hirokatsu Kataoka, Yutaka Satoh National Institute of Advanced Industrial Science and Technology (AIST) Tsukuba,

More information

Industrial Technology Research Institute, Hsinchu, Taiwan, R.O.C ǂ

Industrial Technology Research Institute, Hsinchu, Taiwan, R.O.C ǂ Stop Line Detection and Distance Measurement for Road Intersection based on Deep Learning Neural Network Guan-Ting Lin 1, Patrisia Sherryl Santoso *1, Che-Tsung Lin *ǂ, Chia-Chi Tsai and Jiun-In Guo National

More information

Image Captioning with Object Detection and Localization

Image Captioning with Object Detection and Localization Image Captioning with Object Detection and Localization Zhongliang Yang, Yu-Jin Zhang, Sadaqat ur Rehman, Yongfeng Huang, Department of Electronic Engineering, Tsinghua University, Beijing 100084, China

More information

LARGE-SCALE PERSON RE-IDENTIFICATION AS RETRIEVAL

LARGE-SCALE PERSON RE-IDENTIFICATION AS RETRIEVAL LARGE-SCALE PERSON RE-IDENTIFICATION AS RETRIEVAL Hantao Yao 1,2, Shiliang Zhang 3, Dongming Zhang 1, Yongdong Zhang 1,2, Jintao Li 1, Yu Wang 4, Qi Tian 5 1 Key Lab of Intelligent Information Processing

More information

arxiv: v1 [cs.cv] 14 Jul 2017

arxiv: v1 [cs.cv] 14 Jul 2017 Temporal Modeling Approaches for Large-scale Youtube-8M Video Understanding Fu Li, Chuang Gan, Xiao Liu, Yunlong Bian, Xiang Long, Yandong Li, Zhichao Li, Jie Zhou, Shilei Wen Baidu IDL & Tsinghua University

More information

Object Detection and Its Implementation on Android Devices

Object Detection and Its Implementation on Android Devices Object Detection and Its Implementation on Android Devices Zhongjie Li Stanford University 450 Serra Mall, Stanford, CA 94305 jay2015@stanford.edu Rao Zhang Stanford University 450 Serra Mall, Stanford,

More information

Real-time object detection towards high power efficiency

Real-time object detection towards high power efficiency Real-time object detection towards high power efficiency Jincheng Yu, Kaiyuan Guo, Yiming Hu, Xuefei Ning, Jiantao Qiu, Huizi Mao, Song Yao, Tianqi Tang, Boxun Li, Yu Wang, and Huazhong Yang Tsinghua University,

More information

Facial Key Points Detection using Deep Convolutional Neural Network - NaimishNet

Facial Key Points Detection using Deep Convolutional Neural Network - NaimishNet 1 Facial Key Points Detection using Deep Convolutional Neural Network - NaimishNet Naimish Agarwal, IIIT-Allahabad (irm2013013@iiita.ac.in) Artus Krohn-Grimberghe, University of Paderborn (artus@aisbi.de)

More information

Smart Parking System using Deep Learning. Sheece Gardezi Supervised By: Anoop Cherian Peter Strazdins

Smart Parking System using Deep Learning. Sheece Gardezi Supervised By: Anoop Cherian Peter Strazdins Smart Parking System using Deep Learning Sheece Gardezi Supervised By: Anoop Cherian Peter Strazdins Content Labeling tool Neural Networks Visual Road Map Labeling tool Data set Vgg16 Resnet50 Inception_v3

More information

Extend the shallow part of Single Shot MultiBox Detector via Convolutional Neural Network

Extend the shallow part of Single Shot MultiBox Detector via Convolutional Neural Network Extend the shallow part of Single Shot MultiBox Detector via Convolutional Neural Network Liwen Zheng, Canmiao Fu, Yong Zhao * School of Electronic and Computer Engineering, Shenzhen Graduate School of

More information

arxiv: v3 [cs.cv] 18 Oct 2017

arxiv: v3 [cs.cv] 18 Oct 2017 SSH: Single Stage Headless Face Detector Mahyar Najibi* Pouya Samangouei* Rama Chellappa University of Maryland arxiv:78.3979v3 [cs.cv] 8 Oct 27 najibi@cs.umd.edu Larry S. Davis {pouya,rama,lsd}@umiacs.umd.edu

More information

Comparison of Fine-tuning and Extension Strategies for Deep Convolutional Neural Networks

Comparison of Fine-tuning and Extension Strategies for Deep Convolutional Neural Networks Comparison of Fine-tuning and Extension Strategies for Deep Convolutional Neural Networks Nikiforos Pittaras 1, Foteini Markatopoulou 1,2, Vasileios Mezaris 1, and Ioannis Patras 2 1 Information Technologies

More information

3D Densely Convolutional Networks for Volumetric Segmentation. Toan Duc Bui, Jitae Shin, and Taesup Moon

3D Densely Convolutional Networks for Volumetric Segmentation. Toan Duc Bui, Jitae Shin, and Taesup Moon 3D Densely Convolutional Networks for Volumetric Segmentation Toan Duc Bui, Jitae Shin, and Taesup Moon School of Electronic and Electrical Engineering, Sungkyunkwan University, Republic of Korea arxiv:1709.03199v2

More information

Handwritten Chinese Character Recognition by Joint Classification and Similarity Ranking

Handwritten Chinese Character Recognition by Joint Classification and Similarity Ranking 2016 15th International Conference on Frontiers in Handwriting Recognition Handwritten Chinese Character Recognition by Joint Classification and Similarity Ranking Cheng Cheng, Xu-Yao Zhang, Xiao-Hu Shao

More information

Tiny ImageNet Visual Recognition Challenge

Tiny ImageNet Visual Recognition Challenge Tiny ImageNet Visual Recognition Challenge Ya Le Department of Statistics Stanford University yle@stanford.edu Xuan Yang Department of Electrical Engineering Stanford University xuany@stanford.edu Abstract

More information

Rotation Invariance Neural Network

Rotation Invariance Neural Network Rotation Invariance Neural Network Shiyuan Li Abstract Rotation invariance and translate invariance have great values in image recognition. In this paper, we bring a new architecture in convolutional neural

More information

arxiv: v1 [cs.cv] 5 Oct 2015

arxiv: v1 [cs.cv] 5 Oct 2015 Efficient Object Detection for High Resolution Images Yongxi Lu 1 and Tara Javidi 1 arxiv:1510.01257v1 [cs.cv] 5 Oct 2015 Abstract Efficient generation of high-quality object proposals is an essential

More information

Show, Discriminate, and Tell: A Discriminatory Image Captioning Model with Deep Neural Networks

Show, Discriminate, and Tell: A Discriminatory Image Captioning Model with Deep Neural Networks Show, Discriminate, and Tell: A Discriminatory Image Captioning Model with Deep Neural Networks Zelun Luo Department of Computer Science Stanford University zelunluo@stanford.edu Te-Lin Wu Department of

More information

arxiv: v1 [cs.cv] 15 Oct 2018

arxiv: v1 [cs.cv] 15 Oct 2018 Instance Segmentation and Object Detection with Bounding Shape Masks Ha Young Kim 1,2,*, Ba Rom Kang 2 1 Department of Financial Engineering, Ajou University Worldcupro 206, Yeongtong-gu, Suwon, 16499,

More information

Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks

Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks Shaoqing Ren, Kaiming He, Ross Girshick, and Jian Sun Presented by Tushar Bansal Objective 1. Get bounding box for all objects

More information

Know your data - many types of networks

Know your data - many types of networks Architectures Know your data - many types of networks Fixed length representation Variable length representation Online video sequences, or samples of different sizes Images Specific architectures for

More information

Weighted Convolutional Neural Network. Ensemble.

Weighted Convolutional Neural Network. Ensemble. Weighted Convolutional Neural Network Ensemble Xavier Frazão and Luís A. Alexandre Dept. of Informatics, Univ. Beira Interior and Instituto de Telecomunicações Covilhã, Portugal xavierfrazao@gmail.com

More information

Deep Learning for Computer Vision II

Deep Learning for Computer Vision II IIIT Hyderabad Deep Learning for Computer Vision II C. V. Jawahar Paradigm Shift Feature Extraction (SIFT, HoG, ) Part Models / Encoding Classifier Sparrow Feature Learning Classifier Sparrow L 1 L 2 L

More information

Gated Bi-directional CNN for Object Detection

Gated Bi-directional CNN for Object Detection Gated Bi-directional CNN for Object Detection Xingyu Zeng,, Wanli Ouyang, Bin Yang, Junjie Yan, Xiaogang Wang The Chinese University of Hong Kong, Sensetime Group Limited {xyzeng,wlouyang}@ee.cuhk.edu.hk,

More information

Application of Convolutional Neural Network for Image Classification on Pascal VOC Challenge 2012 dataset

Application of Convolutional Neural Network for Image Classification on Pascal VOC Challenge 2012 dataset Application of Convolutional Neural Network for Image Classification on Pascal VOC Challenge 2012 dataset Suyash Shetty Manipal Institute of Technology suyash.shashikant@learner.manipal.edu Abstract In

More information

arxiv: v2 [cs.cv] 23 May 2016

arxiv: v2 [cs.cv] 23 May 2016 Localizing by Describing: Attribute-Guided Attention Localization for Fine-Grained Recognition arxiv:1605.06217v2 [cs.cv] 23 May 2016 Xiao Liu Jiang Wang Shilei Wen Errui Ding Yuanqing Lin Baidu Research

More information

Supplementary Material: Unconstrained Salient Object Detection via Proposal Subset Optimization

Supplementary Material: Unconstrained Salient Object Detection via Proposal Subset Optimization Supplementary Material: Unconstrained Salient Object via Proposal Subset Optimization 1. Proof of the Submodularity According to Eqns. 10-12 in our paper, the objective function of the proposed optimization

More information

BranchyNet: Fast Inference via Early Exiting from Deep Neural Networks

BranchyNet: Fast Inference via Early Exiting from Deep Neural Networks BranchyNet: Fast Inference via Early Exiting from Deep Neural Networks Surat Teerapittayanon Harvard University Email: steerapi@seas.harvard.edu Bradley McDanel Harvard University Email: mcdanel@fas.harvard.edu

More information

Final Report: Smart Trash Net: Waste Localization and Classification

Final Report: Smart Trash Net: Waste Localization and Classification Final Report: Smart Trash Net: Waste Localization and Classification Oluwasanya Awe oawe@stanford.edu Robel Mengistu robel@stanford.edu December 15, 2017 Vikram Sreedhar vsreed@stanford.edu Abstract Given

More information

Yiqi Yan. May 10, 2017

Yiqi Yan. May 10, 2017 Yiqi Yan May 10, 2017 P a r t I F u n d a m e n t a l B a c k g r o u n d s Convolution Single Filter Multiple Filters 3 Convolution: case study, 2 filters 4 Convolution: receptive field receptive field

More information

Deep Neural Networks for Recognizing Online Handwritten Mathematical Symbols

Deep Neural Networks for Recognizing Online Handwritten Mathematical Symbols Deep Neural Networks for Recognizing Online Handwritten Mathematical Symbols Hai Dai Nguyen 1, Anh Duc Le 2 and Masaki Nakagawa 3 Tokyo University of Agriculture and Technology 2-24-16 Nakacho, Koganei-shi,

More information

Convolution Neural Network for Traditional Chinese Calligraphy Recognition

Convolution Neural Network for Traditional Chinese Calligraphy Recognition Convolution Neural Network for Traditional Chinese Calligraphy Recognition Boqi Li Mechanical Engineering Stanford University boqili@stanford.edu Abstract script. Fig. 1 shows examples of the same TCC

More information

HENet: A Highly Efficient Convolutional Neural. Networks Optimized for Accuracy, Speed and Storage

HENet: A Highly Efficient Convolutional Neural. Networks Optimized for Accuracy, Speed and Storage HENet: A Highly Efficient Convolutional Neural Networks Optimized for Accuracy, Speed and Storage Qiuyu Zhu Shanghai University zhuqiuyu@staff.shu.edu.cn Ruixin Zhang Shanghai University chriszhang96@shu.edu.cn

More information

FACIAL POINT DETECTION BASED ON A CONVOLUTIONAL NEURAL NETWORK WITH OPTIMAL MINI-BATCH PROCEDURE. Chubu University 1200, Matsumoto-cho, Kasugai, AICHI

FACIAL POINT DETECTION BASED ON A CONVOLUTIONAL NEURAL NETWORK WITH OPTIMAL MINI-BATCH PROCEDURE. Chubu University 1200, Matsumoto-cho, Kasugai, AICHI FACIAL POINT DETECTION BASED ON A CONVOLUTIONAL NEURAL NETWORK WITH OPTIMAL MINI-BATCH PROCEDURE Masatoshi Kimura Takayoshi Yamashita Yu Yamauchi Hironobu Fuyoshi* Chubu University 1200, Matsumoto-cho,

More information

Object Detection. CS698N Final Project Presentation AKSHAT AGARWAL SIDDHARTH TANWAR

Object Detection. CS698N Final Project Presentation AKSHAT AGARWAL SIDDHARTH TANWAR Object Detection CS698N Final Project Presentation AKSHAT AGARWAL SIDDHARTH TANWAR Problem Description Arguably the most important part of perception Long term goals for object recognition: Generalization

More information

Automatic detection of books based on Faster R-CNN

Automatic detection of books based on Faster R-CNN Automatic detection of books based on Faster R-CNN Beibei Zhu, Xiaoyu Wu, Lei Yang, Yinghua Shen School of Information Engineering, Communication University of China Beijing, China e-mail: zhubeibei@cuc.edu.cn,

More information

arxiv: v3 [cs.cv] 13 Apr 2016 Abstract

arxiv: v3 [cs.cv] 13 Apr 2016 Abstract ProNet: Learning to Propose Object-specific Boxes for Cascaded Neural Networks Chen Sun 1,2 Manohar Paluri 2 Ronan Collobert 2 Ram Nevatia 1 Lubomir Bourdev 3 1 USC 2 Facebook AI Research 3 UC Berkeley

More information

Robust Face Recognition Based on Convolutional Neural Network

Robust Face Recognition Based on Convolutional Neural Network 2017 2nd International Conference on Manufacturing Science and Information Engineering (ICMSIE 2017) ISBN: 978-1-60595-516-2 Robust Face Recognition Based on Convolutional Neural Network Ying Xu, Hui Ma,

More information

CS 1674: Intro to Computer Vision. Neural Networks. Prof. Adriana Kovashka University of Pittsburgh November 16, 2016

CS 1674: Intro to Computer Vision. Neural Networks. Prof. Adriana Kovashka University of Pittsburgh November 16, 2016 CS 1674: Intro to Computer Vision Neural Networks Prof. Adriana Kovashka University of Pittsburgh November 16, 2016 Announcements Please watch the videos I sent you, if you haven t yet (that s your reading)

More information

Deep Learning with Tensorflow AlexNet

Deep Learning with Tensorflow   AlexNet Machine Learning and Computer Vision Group Deep Learning with Tensorflow http://cvml.ist.ac.at/courses/dlwt_w17/ AlexNet Krizhevsky, Alex, Ilya Sutskever, and Geoffrey E. Hinton, "Imagenet classification

More information

CS 2750: Machine Learning. Neural Networks. Prof. Adriana Kovashka University of Pittsburgh April 13, 2016

CS 2750: Machine Learning. Neural Networks. Prof. Adriana Kovashka University of Pittsburgh April 13, 2016 CS 2750: Machine Learning Neural Networks Prof. Adriana Kovashka University of Pittsburgh April 13, 2016 Plan for today Neural network definition and examples Training neural networks (backprop) Convolutional

More information

TRANSPARENT OBJECT DETECTION USING REGIONS WITH CONVOLUTIONAL NEURAL NETWORK

TRANSPARENT OBJECT DETECTION USING REGIONS WITH CONVOLUTIONAL NEURAL NETWORK TRANSPARENT OBJECT DETECTION USING REGIONS WITH CONVOLUTIONAL NEURAL NETWORK 1 Po-Jen Lai ( 賴柏任 ), 2 Chiou-Shann Fuh ( 傅楸善 ) 1 Dept. of Electrical Engineering, National Taiwan University, Taiwan 2 Dept.

More information

MULTI-VIEW FEATURE FUSION NETWORK FOR VEHICLE RE- IDENTIFICATION

MULTI-VIEW FEATURE FUSION NETWORK FOR VEHICLE RE- IDENTIFICATION MULTI-VIEW FEATURE FUSION NETWORK FOR VEHICLE RE- IDENTIFICATION Haoran Wu, Dong Li, Yucan Zhou, and Qinghua Hu School of Computer Science and Technology, Tianjin University, Tianjin, China ABSTRACT Identifying

More information

arxiv: v1 [cs.cv] 1 Apr 2017

arxiv: v1 [cs.cv] 1 Apr 2017 Multiple Instance Detection Network with Online Instance Classifier Refinement Peng Tang Xinggang Wang Xiang Bai Wenyu Liu School of EIC, Huazhong University of Science and Technology {pengtang,xgwang,xbai,liuwy}@hust.edu.cn

More information

Bounding Out-of-Sample Objects A weakly-supervised approach

Bounding Out-of-Sample Objects A weakly-supervised approach Bounding Out-of-Sample Objects A weakly-supervised approach Li Quan Khoo Stanford University (SCPD) lqkhoo@stanford.edu Abstract In the context of image processing, the most salient parts of the image

More information

CEA LIST s participation to the Scalable Concept Image Annotation task of ImageCLEF 2015

CEA LIST s participation to the Scalable Concept Image Annotation task of ImageCLEF 2015 CEA LIST s participation to the Scalable Concept Image Annotation task of ImageCLEF 2015 Etienne Gadeski, Hervé Le Borgne, and Adrian Popescu CEA, LIST, Laboratory of Vision and Content Engineering, France

More information

arxiv: v4 [cs.cv] 6 Jul 2016

arxiv: v4 [cs.cv] 6 Jul 2016 Object Boundary Guided Semantic Segmentation Qin Huang, Chunyang Xia, Wenchao Zheng, Yuhang Song, Hao Xu, C.-C. Jay Kuo (qinhuang@usc.edu) arxiv:1603.09742v4 [cs.cv] 6 Jul 2016 Abstract. Semantic segmentation

More information

RSRN: Rich Side-output Residual Network for Medial Axis Detection

RSRN: Rich Side-output Residual Network for Medial Axis Detection RSRN: Rich Side-output Residual Network for Medial Axis Detection Chang Liu, Wei Ke, Jianbin Jiao, and Qixiang Ye University of Chinese Academy of Sciences, Beijing, China {liuchang615, kewei11}@mails.ucas.ac.cn,

More information

Pedestrian Detection based on Deep Fusion Network using Feature Correlation

Pedestrian Detection based on Deep Fusion Network using Feature Correlation Pedestrian Detection based on Deep Fusion Network using Feature Correlation Yongwoo Lee, Toan Duc Bui and Jitae Shin School of Electronic and Electrical Engineering, Sungkyunkwan University, Suwon, South

More information

Object Detection. TA : Young-geun Kim. Biostatistics Lab., Seoul National University. March-June, 2018

Object Detection. TA : Young-geun Kim. Biostatistics Lab., Seoul National University. March-June, 2018 Object Detection TA : Young-geun Kim Biostatistics Lab., Seoul National University March-June, 2018 Seoul National University Deep Learning March-June, 2018 1 / 57 Index 1 Introduction 2 R-CNN 3 YOLO 4

More information

Ryerson University CP8208. Soft Computing and Machine Intelligence. Naive Road-Detection using CNNS. Authors: Sarah Asiri - Domenic Curro

Ryerson University CP8208. Soft Computing and Machine Intelligence. Naive Road-Detection using CNNS. Authors: Sarah Asiri - Domenic Curro Ryerson University CP8208 Soft Computing and Machine Intelligence Naive Road-Detection using CNNS Authors: Sarah Asiri - Domenic Curro April 24 2016 Contents 1 Abstract 2 2 Introduction 2 3 Motivation

More information

Kaggle Data Science Bowl 2017 Technical Report

Kaggle Data Science Bowl 2017 Technical Report Kaggle Data Science Bowl 2017 Technical Report qfpxfd Team May 11, 2017 1 Team Members Table 1: Team members Name E-Mail University Jia Ding dingjia@pku.edu.cn Peking University, Beijing, China Aoxue Li

More information

FUSION NETWORK FOR FACE-BASED AGE ESTIMATION

FUSION NETWORK FOR FACE-BASED AGE ESTIMATION FUSION NETWORK FOR FACE-BASED AGE ESTIMATION Haoyi Wang 1 Xingjie Wei 2 Victor Sanchez 1 Chang-Tsun Li 1,3 1 Department of Computer Science, The University of Warwick, Coventry, UK 2 School of Management,

More information

A FRAMEWORK OF EXTRACTING MULTI-SCALE FEATURES USING MULTIPLE CONVOLUTIONAL NEURAL NETWORKS. Kuan-Chuan Peng and Tsuhan Chen

A FRAMEWORK OF EXTRACTING MULTI-SCALE FEATURES USING MULTIPLE CONVOLUTIONAL NEURAL NETWORKS. Kuan-Chuan Peng and Tsuhan Chen A FRAMEWORK OF EXTRACTING MULTI-SCALE FEATURES USING MULTIPLE CONVOLUTIONAL NEURAL NETWORKS Kuan-Chuan Peng and Tsuhan Chen School of Electrical and Computer Engineering, Cornell University, Ithaca, NY

More information

Deeply Cascaded Networks

Deeply Cascaded Networks Deeply Cascaded Networks Eunbyung Park Department of Computer Science University of North Carolina at Chapel Hill eunbyung@cs.unc.edu 1 Introduction After the seminal work of Viola-Jones[15] fast object

More information

Aggregating Frame-level Features for Large-Scale Video Classification

Aggregating Frame-level Features for Large-Scale Video Classification Aggregating Frame-level Features for Large-Scale Video Classification Shaoxiang Chen 1, Xi Wang 1, Yongyi Tang 2, Xinpeng Chen 3, Zuxuan Wu 1, Yu-Gang Jiang 1 1 Fudan University 2 Sun Yat-Sen University

More information

Fewer is More: Image Segmentation Based Weakly Supervised Object Detection with Partial Aggregation

Fewer is More: Image Segmentation Based Weakly Supervised Object Detection with Partial Aggregation GE, WANG, QI: FEWER IS MORE 1 Fewer is More: Image Segmentation Based Weakly Supervised Object Detection with Partial Aggregation Ce Ge nwlgc@bupt.edu.cn Jingyu Wang wangjingyu@bupt.edu.cn Qi Qi qiqi@ebupt.com

More information

Face Recognition A Deep Learning Approach

Face Recognition A Deep Learning Approach Face Recognition A Deep Learning Approach Lihi Shiloh Tal Perl Deep Learning Seminar 2 Outline What about Cat recognition? Classical face recognition Modern face recognition DeepFace FaceNet Comparison

More information

Cost-alleviative Learning for Deep Convolutional Neural Network-based Facial Part Labeling

Cost-alleviative Learning for Deep Convolutional Neural Network-based Facial Part Labeling [DOI: 10.2197/ipsjtcva.7.99] Express Paper Cost-alleviative Learning for Deep Convolutional Neural Network-based Facial Part Labeling Takayoshi Yamashita 1,a) Takaya Nakamura 1 Hiroshi Fukui 1,b) Yuji

More information

Deep Learning and Its Applications

Deep Learning and Its Applications Convolutional Neural Network and Its Application in Image Recognition Oct 28, 2016 Outline 1 A Motivating Example 2 The Convolutional Neural Network (CNN) Model 3 Training the CNN Model 4 Issues and Recent

More information

Fuzzy Set Theory in Computer Vision: Example 3

Fuzzy Set Theory in Computer Vision: Example 3 Fuzzy Set Theory in Computer Vision: Example 3 Derek T. Anderson and James M. Keller FUZZ-IEEE, July 2017 Overview Purpose of these slides are to make you aware of a few of the different CNN architectures

More information

Como funciona o Deep Learning

Como funciona o Deep Learning Como funciona o Deep Learning Moacir Ponti (com ajuda de Gabriel Paranhos da Costa) ICMC, Universidade de São Paulo Contact: www.icmc.usp.br/~moacir moacir@icmc.usp.br Uberlandia-MG/Brazil October, 2017

More information

Machine Learning 13. week

Machine Learning 13. week Machine Learning 13. week Deep Learning Convolutional Neural Network Recurrent Neural Network 1 Why Deep Learning is so Popular? 1. Increase in the amount of data Thanks to the Internet, huge amount of

More information

Efficient Segmentation-Aided Text Detection For Intelligent Robots

Efficient Segmentation-Aided Text Detection For Intelligent Robots Efficient Segmentation-Aided Text Detection For Intelligent Robots Junting Zhang, Yuewei Na, Siyang Li, C.-C. Jay Kuo University of Southern California Outline Problem Definition and Motivation Related

More information