Handwritten Chinese Character Recognition by Joint Classification and Similarity Ranking

Size: px
Start display at page:

Download "Handwritten Chinese Character Recognition by Joint Classification and Similarity Ranking"

Transcription

1 th International Conference on Frontiers in Handwriting Recognition Handwritten Chinese Character Recognition by Joint Classification and Similarity Ranking Cheng Cheng, Xu-Yao Zhang, Xiao-Hu Shao and Xiang-Dong Zhou Chongqing Institute of Green and Intelligent Technology, Chinese Academy of Sciences Institute of Automation, Chinese Academy of Sciences {chengcheng, shaoxiaohu, Abstract Deep convolutional neural networks (DCNN) have recently achieved state-of-the-art performance on handwritten Chinese character recognition (HCCR). However, most of DCNN models employ the softmax activation function and minimize cross-entropy loss, which may loss some inter-class information. To cope with this problem, we demonstrate a small but consistent advantage of using both classification and similarity ranking signals as supervision. Specifically, the presented method learns a DCNN model by maximizing the inter-class variations and minimizing the intra-class variations, and simultaneously minimizing the cross-entropy loss. In addition, we also review some loss functions for similarity ranking and evaluate their performance. Our experiments demonstrate that the presented method achieves state-of-the-art accuracy on the well-known ICDAR 2013 offline HCCR competition dataset. Index Terms Similarity Ranking Character Recognition Deep Convolutional Neural Networks Handwritten Chinese character recognition (HCCR) has been intensively studied in the past forty years and is of practical importance for bank check reading, taxform processing, book and handwritten notes transcription, and so on. Although many studies have been conducted, it remains a challenging problem due to the diversity of writing styles, large character set and the presence of many confusing character pairs. Some samples with different writing styles and confusing character pairs are show in Fig. 1 and Fig. 2, respectively. Existing HCCR methods can be mainly classified into two categories: Traditional methods and DCNN based methods. In the first category, there are typically four basic steps: shape normalization, feature extraction, dimensionality reduction and classifier construction. To improve the recognition performance, many effective methods, include nonlinear normalization [17], pseudo two dimensional normalization [14], gradient direction feature extraction [13], modified quadratic discriminant function [12] and discriminative learning quadratic discriminant function [15], have been proposed for these steps. In the second category, a DCNN model composed of layers of convolution, rectification and pooling is trained via back propagation. Unlike traditional methods, they substitute the separate steps, i.e. feature extraction, dimensionality reduction and classifier construction with a single deep architecture and only require shape normalization in the four steps. These expressivity and robust training algorithms allow for learning powerful object representations without the need of handcrafted features. However, most of DCNN based methods use the softmax activation function (also known as multinomial logistic regression) for classification, which we find may loss some inter-class information. Fig. 1: Characters with different writting styles. I. INTRODUCTION Fig. 2: Examples of confused character pairs. In this paper, we contribute to the second category and present a deep triplet network (DTN) method of which the basic idea is illustrated in Fig. 3. Unlike most existing methods, the presented method learns a DCNN model using both classification and similarity ranking signals as supervision. Classification is to classify an input image into a large number of identity classes, while similarity ranking is to minimize the intra-class distance while maximizing the inter-class distance. In addition, we also investigate the loss functions of similarity ranking algorithms and aim to improve the performance using a new form of loss function. CNN fc softmax Triplet Ranking Fig. 3: The structure of the proposed model. The rest of this paper is organized as follows: Section II briefly introduces the related previous work Section III re /16 $ IEEE DOI /ICFHR

2 views the softmax and similarity ranking Section IV presents the loss function for similarity ranking Section V presents our experimental results and the last section concludes the paper. II. RELATED WORK A. HCCR by DCNN In recent years, DCNN has received increasing interests in computer vision and machine learning, a number of DCNN methods have been proposed for HCCR in the literatures [2], [3], [4], [21], and continued their success by winning both online and offline HCCR competitions at the ICDAR 2013 [23]. Generally, DCNN aims to learn hierarchical feature representations by building high-level features from lowlevel ones. There are two notable breakthroughs. The first is large-scale character classification with DCNN [25], [26]. Meanwhile, the domain-specific knowledge, such as Gabor or normalization-cooperated direction-decomposed feature map, is used for enhancing the performance of DCNN. The second is supervised DCNN with both character reconstruction and verification tasks [2]. The reconstruction task minimizes the distance between features of the same category. In this paper, we extend DCNN model using classification and similarity ranking signals as supervision. B. Similarity Ranking The present method falls under the big umbrella of similarity ranking. Similarity ranking based DCNN methods has been proved effective in a wide range of tasks, such as face recognition [10], [18], person re-identification [5], [24] and image retrieval [20]. The framework of the above mentioned papers is to organize the training images into a batch of triplet samples, each sample containing two images with the same label and one with a different label. With these triplet samples, they tend to minimize the intra-class distance while maximizing the inter-class distance for each triplet unit using Euclidean distance metric. In the field of character recognition, the closest method to our approach is the discriminative DCNN by Kim et al. [11]. The discriminative DCNN focuses on the differences among similar classes, and thereby improves the discrimination ability of the DCNN. III. METHODOLOGY We aim to train a DCNN model using both classification and similarity ranking signals as supervision. The first is character classification signal, which classifies each character image into n (e.g., n = 3755) different categories. It is achieved by following the fully connected layer with an n-way softmax layer, which outputs a probability distribution over the n classes. The network is trained to minimize the cross-entropy loss, which is denoted as n L(f i,k,θ cls )= p i log p i (1) i=1 in which k is a true class label, L(f i,k,θ cls ) is the standard cross-entropy/log loss, and θ cls denotes the softmax layer parameters. p i is the target probability distribution, where p i = 0 for all i except p i = 1 for the target class k. The second is similarity ranking signals, which project character pairs into the same feature subspace. The distance of each positive character pair is less than a smaller threshold and that of each negative pair is higher than a larger threshold, respectively. We adopt the following loss function, which was originally proposed by Wang et al [20] and widely used in face recognition [18], person re-identification [5] and image retrieval [24]. L(f i,f j,f k,θ tri )=max( f i f j 2 2 f i f k 2 2+Δ, 0) (2) in which f i,f j are features of two character images have the same label, f i,f k are features of two mismatched character images, Δ is a margin that is enforced between positive and negative image pairs, and θ tri is the parameter to be learned in the similarity ranking loss function. All the two loss functions are evaluated and compared in our experiments. Our goal is to learn the parameters θ con in the DCNN model, while θ tri and θ cls are parameters introduced to propagate the classification and similarity ranking signals during training. In the testing stage, only θ cls and θ con are used for classification. The parameters are updated by stochastic gradient descent on each triplet unit. The gradients of two supervisory signals are weighted by a hyperparameter λ. Our learning algorithm is summarized in Algorithm 1. Algorithm 1 The learning algorithm Require: training set χ = {x i,y i }, initialized parameters θ cls, θ con and θ tri, hyperparameter λ Ensure: parameters θ cls, θ con and θ tri 1: for m =1to iter do 2: sample a triplet units (x i,y i ), (x j,y j ) and (x k,y k ) from χ, in which x i,x j have the same label 3: f i = Conv(x i,θ con ), f j = Conv(x j,θ con ), f k = Conv(x k,θ con ) 4: θ cls = L(fi,yi,θ cls) + L(fj,yj,θ cls) + L(f k,y k,θ cls ) 5: θ tri = λ L(fi,fj,f k,θ tri) 6: f i = L(fi,yi,θ cls) 7: f j = L(fj,yj,θ cls) 8: f k = L(f k,y k,θ cls ) θ tri 9: θ con = Conv(xi,θcon) Conv(x k,θ con) 10: end for + Conv(xj,θcon) + IV. LOSS FUNCTION FOR SIMILARITY RANKING In this section, we describe and compare three different loss functions, which can be used in the proposed framework

3 A. Euclidean Distance In the absence of prior knowledge, most similarity ranking use simple Euclidean distance to measure the dissimilarities between examples represented as vector inputs. The cost function over the distance metrics parameterized by eq. 2 has two competing terms. The first term penalizes large distances between each input and its target neighbors, while the second term penalizes small distances between each input and all other inputs that do not share the same label. Specifically, the cost function is given by: N [ ] L = f i f j 2 2 f i f k 2 2 +Δ (3) where N is the number of triplet. It is easy to calculate the derivative of the loss with respective to characters as: L = f i f j 2 f i f k 2 L = f i f j 2 (4) L = f i f k 2 B. Logistic Discriminant Based The Euclidean distance is sensitive to the scale, and is blind to the correlation across dimensions. To overcome the deficiency of Euclidean distance, we use a standard linear logistic discriminant function to model the triplet units as: 1 p n =1 = σ(d) (5) 1+e d in which d p = f i f j 2 2, d n = f i f k 2 2 and d = d p d n. We model the probability p n that triplet n =(i, j, k) is positive (fulfill the constraint in Eq. 2). If d<0, it is misclassified, we use maximum log-likelihood to optimize the parameters of the model. The log-likelihood L can be written as: N [ ] L = t n ln p n +(1 t n ) ln(1 p n ) (6) C. Conditional Log-likelihood Loss In [9], the generalization of limitations of the above loss are discussed, and a regularization term is added for the above loss function to avoid over-fitting in training as well as to maximize the hypothesis margin. Follow [9], we rewrite the loss function as: p n = σ(d)+α f i f j 2 2 (7) where α is the regularization coefficient. Intuitively, the regularizer pays more attention to the intra-class variations. V. EXPERIMENTS To verify the effectiveness of presented method we conduct experiments on the offline HCCR databases [16], including D- B1.0, DB1.1 and test set of ICDAR-2013 Chinese handwriting recognition competition [23] (denoted as ICDAR-2013). Fig. 4: The network architecture of the presented method. A. Implementation Details We implement the present methods using the caffe [8] with our modifications. All experiments are run on four GPU. All the models are trained based on the same implementation as follows. 1) Data Augmentation: During training, the image is perturbed by the single model or the combined model, as in [1]. A half of random samples are flipped horizontally. We also adopt some augmentations that were proposed by the previous work [22], such as add a random integer ranging from 20 to +20 to the image, grey shifting, Gaussian blur, and so on. 2) Network and Settings: Deep residual networks and Inception architecture were independently proposed in [6] and [19]. Both of them achieved high performance in ImageNet challenges. Integrated the tricks of these two papers, we design an architecture as show in Fig. 4. It consists of 2 convolutional layers, 9 Inception layers, 5 pooling layers, 2 fully connected layer, 1 similarity loss layer and 1 softmax loss layer. The first four pooling layers use max operator and the last pooling layer is average. The outputs of 9 Inception layers, are added to the

4 TABLE I: Recognition rates (%) on DB1.1 and ICDAR-2013 trained with DB1.1 loss function DB1.1 ICDAR-2013 top-1 top-10 top-1 top-10 softmax softmax + similarity ranking (UD) softmax + similarity ranking (LD) softmax + similarity ranking (CLL) TABLE II: Recognition rates (%) on ICDAR-2013 trained with DB1.0 and DB1.1 loss function Ensemble method top-1 top-10 Dic.size softmax no GoogleNet [26] MB softmax no directmap [25] NULL 23.50MB softmax no ours MB softmax + CLL no ours M softmax + CLL yes (4) ours M outputs of the last fc layer. Follow [7], the batch normalization is used after each convolution and before ReLU activation. We train the DCNN models using SGD with a mini-batch size of 360. The learning rate is set to 5e-2 initially and reduce to 1e-5 gradually. The models are initialized from random from zero-mean Gaussian distributions, and trained on four Titan X GPU for 300 hours. B. Results on DB1.1 and ICDAR-2013 In this experiment, we used the 240 writers (no ) of DB1.1 databases for training, the test dataset from remain 40 writers and the 2013 ICDAR Chinese handwriting recognition competition, respectively. In Section III, we introduced the method to training a DCNN model using both classification and similarity ranking signals as supervision, and three loss function for similarity ranking, namely, Euclidean distance (Ed), Logistic Discriminant Based (LD) and Conditional loglike lihoodloss (CLL). The recognition results on DB1.1 and ICDAR-2013 test set are shown in Table I. First, compared to the results of baseline DCNN method using only softmax function in Table I, we can see that the recognition accuracy is improved further by combined two type of signals, especially the similarity ranking with CLL loss function. This demonstrates the benefits of the proposed method, improving top 1 recognition rate from percent to percent. Next, we compare the results of three loss function for similarity ranking. Table I shows that the results of CLL methods are better than those of ED and LD for similarity ranking. C. Comparison with other State-of-the-art Methods In this experiment, we used the DB1.0 and DB1.1 [16] databases for training, and ICDAR-2013 for testing. To show the outstanding performance of the proposed method, we compare the performances with HCCR [26] which reports very good results on the ICDAR-2013 datasets. The recognition results on ICDAR-2013 test set are shown in Table II. First, compared to the recognition results of GoogleNet and presented network architecture with same loss function from Table II. It is observed that the presented architecture yield higher recognition rate than the basedline GoogleNet architecture. Next, compared with the state-of-the-art result of the previous working [25], our method achieved a significant improvement with a relative 19.45% error rate reduction. It is worth noting that the memory cost of the presented model is 36.20MB, which is bigger than the baseline GoogleNet model (27.77MB) and the directmap model (23.50MB). That is because the proposed model combine of all Inception layers with their output filter banks concatenated into a fully connected layer. VI. CONCLUSION This paper shows that the character classification and similarity ranking supervisory signals are complementary for each other, which can increase inter-class variations and reduce intra-class variations, and therefore much better classification performance can be achieved. Combination of the two supervisory signals leads to significantly better results than only softmax based character classification. Experiments on the ICDAR 2013 offline HCCR competition dataset show that our best result is superior to all previous works. The best testing error rate we achieved is 2.36%, which is a new state-of-the-art record according to our knowledge. ACKNOWLEDGMENT This work is supported by the National Natural Science Foundation of China under Grants Nos , and Chongqing Research Program of Basic Research and Frontier Technology (No. cstc2016jcyja0011). The two Titan X GPUs used for this research were donated by the NVIDIA Corporation. REFERENCES [1] B. Chen, B. Zhu, and M. Nakagawa. Training of an on-line handwritten japanese character recognizer by artificial patterns. Pattern Recognition Letters, 35: , [2] L. Chen, S. Wang, S. Wang, J. Sun, and J. Sun. Reconstruction combined training for convolutional neural networks on character recognition. In International Conference on Document Analysis and Recognition, pages , [3] C. Dan, U. Meier, and J. Schmidhuber. Multi-column deep neural networks for image classification. In International Conference on Computer Vision and Pattern Recognition, pages ,

5 [4] C. Dan and J. Schmidhuber. Multi-column deep neural networks for offline handwritten chinese character classification. In arxiv, [5] S. Ding, L. Lin, G. Wang, and H. Chao. Deep feature learning with relative distance comparison for person re-identification. Pattern Recognition, 48(1): , [6] K. He, X. Zhang, X. Zhang, and X. Zhang. Deep residual learning for image recognition. In arxiv, [7] S. Ioffe and C. Szegedy. Batch normalization: Accelerating deep network training by reducing internal covariate shift. In International Conference on Machine Learning, [8] Y. Jia, E. Shelhamer, J. Donahue, S. Karayev, J. Long, R. Girshick, S. Guadarrama, and T. Darrell. Caffe: Convolutional architecture for fast feature embedding. In ACM International Conference on Multimedia, pages , [9] X.-B. Jin, C.-L. Liu, and X. Hou. Regularized margin-based conditional log-likelihood loss for prototype learning. Pattern Recognition, 43(7): , [10] L. Jinguo, Y. Deng, and C. Huang. Targeting ultimate accuracy: Face recognition via deep embedding. In arxiv, [11] I.-J. Kim, C. Choi, and C. Choi. Improving discrimination ability of convolutional neural networks by hybrid learning. IJDAR, 19(1):1 9, [12] F. Kimura, K. Takashina, S. Tsuruoka, and Y. Miyake. Modified quadratic discriminant functions and the application to chinese character recognition. IEEE Transactions on Pattern Analysis and Machine Intelligence, 9(1): , [13] C.-L. Liu. Normalization-cooperated gradient feature extraction for handwritten character recognition. IEEE Transactions on Pattern Analysis and Machine Intelligence, 29(8): , [14] C.-L. Liu and K. Marukawa. Pseudo two-dimensional shape normalization methods for handwritten chinese character recognition. Pattern Recognition, 38(12): , [15] C.-L. Liu, H. Sako, and H. Fujisawa. Discriminative learning quadratic discriminant function for handwriting recognition. IEEE Transactions on Neural Networks, 15(2): , [16] C.-L. Liu, F. Yin, D.-H. Wang, and Q.-F. Wang. Online and offline handwritten chinese character recognition: Benchmarking on new databases. Pattern Recognition, 46(1): , [17] T. V. Phan, J. Gao, B. Zhu, and M. Nakagawa. Effects of line densities on nonlinear normalization for online handwritten japanese character recognition. In International Conference on Document Analysis and Recognition, pages , [18] F. Schroff, D. Kalenichenko, and J. Philbin. Facenet: A unified embedding for face recognition and clustering. In International Conference on Computer Vision and Pattern Recognition, pages , [19] C. Szegedy, V. Vanhoucke, S. Ioffe, and J. Shlens. Rethinking the inception architecture for computer vision. In arxiv, [20] J. Wang, J. Wang, J. Wang, J. Wang, J. Wang, J. Wang, J. Wang, and Y. Wu. Learning fine-grained image similarity with deep ranking. In International Conference on Computer Vision and Pattern Recognition, pages , [21] C. Wu, W. Fan, Y. He, J. Sun, and S. Naoi. Handwritten character recognition by alternately trained relaxation convolutional neural network. In International Conference on Frontiers in Handwriting Recognition, pages , [22] R. Wu, S. Yan, Y. Shan, Q. Dang, and G. Sun. Deep image: Scaling up image recognition. In arxiv, [23] F. Yin, Q.-F. Wang, X.-Y. Zhang, and C.-L. Liu. Icdar 2013 chinese handwriting recognition competition. In International Conference on Document Analysis and Recognition, pages , [24] R. Zhang, L. Lin, R. Zhang, W. Zuo, and L. Zhang. Bit-scalable deep hashing with regularized similarity learning for image retrieval and person re-identification. IEEE Transa Image Processing, 24(12): , [25] X.-Y. Zhang, Y. Bengio, and C.-L. Liu. Online and offline handwritten chinese character recognition: A comprehensive study and new benchmark. Accepted by Pattern Recognition, [26] Z. Zhong, L. Jin, and Z. Xie. High performance offline handwritten chinese character recognition using googlenet and directional feature maps. In International Conference on Document Analysis and Recognition, pages ,

Channel Locality Block: A Variant of Squeeze-and-Excitation

Channel Locality Block: A Variant of Squeeze-and-Excitation Channel Locality Block: A Variant of Squeeze-and-Excitation 1 st Huayu Li Northern Arizona University Flagstaff, United State Northern Arizona University hl459@nau.edu arxiv:1901.01493v1 [cs.lg] 6 Jan

More information

A FRAMEWORK OF EXTRACTING MULTI-SCALE FEATURES USING MULTIPLE CONVOLUTIONAL NEURAL NETWORKS. Kuan-Chuan Peng and Tsuhan Chen

A FRAMEWORK OF EXTRACTING MULTI-SCALE FEATURES USING MULTIPLE CONVOLUTIONAL NEURAL NETWORKS. Kuan-Chuan Peng and Tsuhan Chen A FRAMEWORK OF EXTRACTING MULTI-SCALE FEATURES USING MULTIPLE CONVOLUTIONAL NEURAL NETWORKS Kuan-Chuan Peng and Tsuhan Chen School of Electrical and Computer Engineering, Cornell University, Ithaca, NY

More information

Proceedings of the International MultiConference of Engineers and Computer Scientists 2018 Vol I IMECS 2018, March 14-16, 2018, Hong Kong

Proceedings of the International MultiConference of Engineers and Computer Scientists 2018 Vol I IMECS 2018, March 14-16, 2018, Hong Kong , March 14-16, 2018, Hong Kong , March 14-16, 2018, Hong Kong , March 14-16, 2018, Hong Kong , March 14-16, 2018, Hong Kong TABLE I CLASSIFICATION ACCURACY OF DIFFERENT PRE-TRAINED MODELS ON THE TEST DATA

More information

HENet: A Highly Efficient Convolutional Neural. Networks Optimized for Accuracy, Speed and Storage

HENet: A Highly Efficient Convolutional Neural. Networks Optimized for Accuracy, Speed and Storage HENet: A Highly Efficient Convolutional Neural Networks Optimized for Accuracy, Speed and Storage Qiuyu Zhu Shanghai University zhuqiuyu@staff.shu.edu.cn Ruixin Zhang Shanghai University chriszhang96@shu.edu.cn

More information

FaceNet. Florian Schroff, Dmitry Kalenichenko, James Philbin Google Inc. Presentation by Ignacio Aranguren and Rahul Rana

FaceNet. Florian Schroff, Dmitry Kalenichenko, James Philbin Google Inc. Presentation by Ignacio Aranguren and Rahul Rana FaceNet Florian Schroff, Dmitry Kalenichenko, James Philbin Google Inc. Presentation by Ignacio Aranguren and Rahul Rana Introduction FaceNet learns a mapping from face images to a compact Euclidean Space

More information

Convolution Neural Networks for Chinese Handwriting Recognition

Convolution Neural Networks for Chinese Handwriting Recognition Convolution Neural Networks for Chinese Handwriting Recognition Xu Chen Stanford University 450 Serra Mall, Stanford, CA 94305 xchen91@stanford.edu Abstract Convolutional neural networks have been proven

More information

Robust Face Recognition Based on Convolutional Neural Network

Robust Face Recognition Based on Convolutional Neural Network 2017 2nd International Conference on Manufacturing Science and Information Engineering (ICMSIE 2017) ISBN: 978-1-60595-516-2 Robust Face Recognition Based on Convolutional Neural Network Ying Xu, Hui Ma,

More information

Content-Based Image Recovery

Content-Based Image Recovery Content-Based Image Recovery Hong-Yu Zhou and Jianxin Wu National Key Laboratory for Novel Software Technology Nanjing University, China zhouhy@lamda.nju.edu.cn wujx2001@nju.edu.cn Abstract. We propose

More information

Dual Learning of the Generator and Recognizer for Chinese Characters

Dual Learning of the Generator and Recognizer for Chinese Characters 2017 4th IAPR Asian Conference on Pattern Recognition Dual Learning of the Generator and Recognizer for Chinese Characters Yixing Zhu, Jun Du and Jianshu Zhang National Engineering Laboratory for Speech

More information

Deep Neural Networks for Recognizing Online Handwritten Mathematical Symbols

Deep Neural Networks for Recognizing Online Handwritten Mathematical Symbols Deep Neural Networks for Recognizing Online Handwritten Mathematical Symbols Hai Dai Nguyen 1, Anh Duc Le 2 and Masaki Nakagawa 3 Tokyo University of Agriculture and Technology 2-24-16 Nakacho, Koganei-shi,

More information

Supplementary material for Analyzing Filters Toward Efficient ConvNet

Supplementary material for Analyzing Filters Toward Efficient ConvNet Supplementary material for Analyzing Filters Toward Efficient Net Takumi Kobayashi National Institute of Advanced Industrial Science and Technology, Japan takumi.kobayashi@aist.go.jp A. Orthonormal Steerable

More information

Kaggle Data Science Bowl 2017 Technical Report

Kaggle Data Science Bowl 2017 Technical Report Kaggle Data Science Bowl 2017 Technical Report qfpxfd Team May 11, 2017 1 Team Members Table 1: Team members Name E-Mail University Jia Ding dingjia@pku.edu.cn Peking University, Beijing, China Aoxue Li

More information

CEA LIST s participation to the Scalable Concept Image Annotation task of ImageCLEF 2015

CEA LIST s participation to the Scalable Concept Image Annotation task of ImageCLEF 2015 CEA LIST s participation to the Scalable Concept Image Annotation task of ImageCLEF 2015 Etienne Gadeski, Hervé Le Borgne, and Adrian Popescu CEA, LIST, Laboratory of Vision and Content Engineering, France

More information

Real-time Object Detection CS 229 Course Project

Real-time Object Detection CS 229 Course Project Real-time Object Detection CS 229 Course Project Zibo Gong 1, Tianchang He 1, and Ziyi Yang 1 1 Department of Electrical Engineering, Stanford University December 17, 2016 Abstract Objection detection

More information

MULTI-VIEW FEATURE FUSION NETWORK FOR VEHICLE RE- IDENTIFICATION

MULTI-VIEW FEATURE FUSION NETWORK FOR VEHICLE RE- IDENTIFICATION MULTI-VIEW FEATURE FUSION NETWORK FOR VEHICLE RE- IDENTIFICATION Haoran Wu, Dong Li, Yucan Zhou, and Qinghua Hu School of Computer Science and Technology, Tianjin University, Tianjin, China ABSTRACT Identifying

More information

Deep Learning for Computer Vision II

Deep Learning for Computer Vision II IIIT Hyderabad Deep Learning for Computer Vision II C. V. Jawahar Paradigm Shift Feature Extraction (SIFT, HoG, ) Part Models / Encoding Classifier Sparrow Feature Learning Classifier Sparrow L 1 L 2 L

More information

Deep Convolutional Neural Network using Triplet of Faces, Deep Ensemble, and Scorelevel Fusion for Face Recognition

Deep Convolutional Neural Network using Triplet of Faces, Deep Ensemble, and Scorelevel Fusion for Face Recognition IEEE 2017 Conference on Computer Vision and Pattern Recognition Deep Convolutional Neural Network using Triplet of Faces, Deep Ensemble, and Scorelevel Fusion for Face Recognition Bong-Nam Kang*, Yonghyun

More information

Handwriting Character Recognition as a Service:A New Handwriting Recognition System Based on Cloud Computing

Handwriting Character Recognition as a Service:A New Handwriting Recognition System Based on Cloud Computing 2011 International Conference on Document Analysis and Recognition Handwriting Character Recognition as a Service:A New Handwriting Recognition Based on Cloud Computing Yan Gao, Lanwen Jin +, Cong He,

More information

Study of Residual Networks for Image Recognition

Study of Residual Networks for Image Recognition Study of Residual Networks for Image Recognition Mohammad Sadegh Ebrahimi Stanford University sadegh@stanford.edu Hossein Karkeh Abadi Stanford University hosseink@stanford.edu Abstract Deep neural networks

More information

Convolutional Neural Networks. Computer Vision Jia-Bin Huang, Virginia Tech

Convolutional Neural Networks. Computer Vision Jia-Bin Huang, Virginia Tech Convolutional Neural Networks Computer Vision Jia-Bin Huang, Virginia Tech Today s class Overview Convolutional Neural Network (CNN) Training CNN Understanding and Visualizing CNN Image Categorization:

More information

Learning-Based Candidate Segmentation Scoring for Real-Time Recognition of Online Overlaid Chinese Handwriting

Learning-Based Candidate Segmentation Scoring for Real-Time Recognition of Online Overlaid Chinese Handwriting 2013 12th International Conference on Document Analysis and Recognition Learning-Based Candidate Segmentation Scoring for Real-Time Recognition of Online Overlaid Chinese Handwriting Yan-Fei Lv 1, Lin-Lin

More information

Face Recognition A Deep Learning Approach

Face Recognition A Deep Learning Approach Face Recognition A Deep Learning Approach Lihi Shiloh Tal Perl Deep Learning Seminar 2 Outline What about Cat recognition? Classical face recognition Modern face recognition DeepFace FaceNet Comparison

More information

Feature-Fused SSD: Fast Detection for Small Objects

Feature-Fused SSD: Fast Detection for Small Objects Feature-Fused SSD: Fast Detection for Small Objects Guimei Cao, Xuemei Xie, Wenzhe Yang, Quan Liao, Guangming Shi, Jinjian Wu School of Electronic Engineering, Xidian University, China xmxie@mail.xidian.edu.cn

More information

Elastic Neural Networks for Classification

Elastic Neural Networks for Classification Elastic Neural Networks for Classification Yi Zhou 1, Yue Bai 1, Shuvra S. Bhattacharyya 1, 2 and Heikki Huttunen 1 1 Tampere University of Technology, Finland, 2 University of Maryland, USA arxiv:1810.00589v3

More information

MULTI-SCALE OBJECT DETECTION WITH FEATURE FUSION AND REGION OBJECTNESS NETWORK. Wenjie Guan, YueXian Zou*, Xiaoqun Zhou

MULTI-SCALE OBJECT DETECTION WITH FEATURE FUSION AND REGION OBJECTNESS NETWORK. Wenjie Guan, YueXian Zou*, Xiaoqun Zhou MULTI-SCALE OBJECT DETECTION WITH FEATURE FUSION AND REGION OBJECTNESS NETWORK Wenjie Guan, YueXian Zou*, Xiaoqun Zhou ADSPLAB/Intelligent Lab, School of ECE, Peking University, Shenzhen,518055, China

More information

Clustering and Unsupervised Anomaly Detection with l 2 Normalized Deep Auto-Encoder Representations

Clustering and Unsupervised Anomaly Detection with l 2 Normalized Deep Auto-Encoder Representations Clustering and Unsupervised Anomaly Detection with l 2 Normalized Deep Auto-Encoder Representations Caglar Aytekin, Xingyang Ni, Francesco Cricri and Emre Aksu Nokia Technologies, Tampere, Finland Corresponding

More information

Deep Learning with Tensorflow AlexNet

Deep Learning with Tensorflow   AlexNet Machine Learning and Computer Vision Group Deep Learning with Tensorflow http://cvml.ist.ac.at/courses/dlwt_w17/ AlexNet Krizhevsky, Alex, Ilya Sutskever, and Geoffrey E. Hinton, "Imagenet classification

More information

Learning to Match. Jun Xu, Zhengdong Lu, Tianqi Chen, Hang Li

Learning to Match. Jun Xu, Zhengdong Lu, Tianqi Chen, Hang Li Learning to Match Jun Xu, Zhengdong Lu, Tianqi Chen, Hang Li 1. Introduction The main tasks in many applications can be formalized as matching between heterogeneous objects, including search, recommendation,

More information

arxiv: v1 [cs.cv] 6 Jul 2016

arxiv: v1 [cs.cv] 6 Jul 2016 arxiv:607.079v [cs.cv] 6 Jul 206 Deep CORAL: Correlation Alignment for Deep Domain Adaptation Baochen Sun and Kate Saenko University of Massachusetts Lowell, Boston University Abstract. Deep neural networks

More information

Based on improved STN-CNN facial expression recognition

Based on improved STN-CNN facial expression recognition Journal of Computing and Electronic Information Management ISSN: 2413-1660 Based on improved STN-CNN facial expression recognition Jianfei Ding Automated institute, Chongqing University of Posts and Telecommunications,

More information

3D model classification using convolutional neural network

3D model classification using convolutional neural network 3D model classification using convolutional neural network JunYoung Gwak Stanford jgwak@cs.stanford.edu Abstract Our goal is to classify 3D models directly using convolutional neural network. Most of existing

More information

Perceptron: This is convolution!

Perceptron: This is convolution! Perceptron: This is convolution! v v v Shared weights v Filter = local perceptron. Also called kernel. By pooling responses at different locations, we gain robustness to the exact spatial location of image

More information

LARGE-SCALE PERSON RE-IDENTIFICATION AS RETRIEVAL

LARGE-SCALE PERSON RE-IDENTIFICATION AS RETRIEVAL LARGE-SCALE PERSON RE-IDENTIFICATION AS RETRIEVAL Hantao Yao 1,2, Shiliang Zhang 3, Dongming Zhang 1, Yongdong Zhang 1,2, Jintao Li 1, Yu Wang 4, Qi Tian 5 1 Key Lab of Intelligent Information Processing

More information

Multi-Glance Attention Models For Image Classification

Multi-Glance Attention Models For Image Classification Multi-Glance Attention Models For Image Classification Chinmay Duvedi Stanford University Stanford, CA cduvedi@stanford.edu Pararth Shah Stanford University Stanford, CA pararth@stanford.edu Abstract We

More information

Cross-domain Deep Encoding for 3D Voxels and 2D Images

Cross-domain Deep Encoding for 3D Voxels and 2D Images Cross-domain Deep Encoding for 3D Voxels and 2D Images Jingwei Ji Stanford University jingweij@stanford.edu Danyang Wang Stanford University danyangw@stanford.edu 1. Introduction 3D reconstruction is one

More information

Computer Vision Lecture 16

Computer Vision Lecture 16 Announcements Computer Vision Lecture 16 Deep Learning Applications 11.01.2017 Seminar registration period starts on Friday We will offer a lab course in the summer semester Deep Robot Learning Topic:

More information

Computer Vision Lecture 16

Computer Vision Lecture 16 Computer Vision Lecture 16 Deep Learning Applications 11.01.2017 Bastian Leibe RWTH Aachen http://www.vision.rwth-aachen.de leibe@vision.rwth-aachen.de Announcements Seminar registration period starts

More information

Joint Unsupervised Learning of Deep Representations and Image Clusters Supplementary materials

Joint Unsupervised Learning of Deep Representations and Image Clusters Supplementary materials Joint Unsupervised Learning of Deep Representations and Image Clusters Supplementary materials Jianwei Yang, Devi Parikh, Dhruv Batra Virginia Tech {jw2yang, parikh, dbatra}@vt.edu Abstract This supplementary

More information

Automatic detection of books based on Faster R-CNN

Automatic detection of books based on Faster R-CNN Automatic detection of books based on Faster R-CNN Beibei Zhu, Xiaoyu Wu, Lei Yang, Yinghua Shen School of Information Engineering, Communication University of China Beijing, China e-mail: zhubeibei@cuc.edu.cn,

More information

Learning image representations equivariant to ego-motion (Supplementary material)

Learning image representations equivariant to ego-motion (Supplementary material) Learning image representations equivariant to ego-motion (Supplementary material) Dinesh Jayaraman UT Austin dineshj@cs.utexas.edu Kristen Grauman UT Austin grauman@cs.utexas.edu max-pool (3x3, stride2)

More information

MCMOT: Multi-Class Multi-Object Tracking using Changing Point Detection

MCMOT: Multi-Class Multi-Object Tracking using Changing Point Detection MCMOT: Multi-Class Multi-Object Tracking using Changing Point Detection ILSVRC 2016 Object Detection from Video Byungjae Lee¹, Songguo Jin¹, Enkhbayar Erdenee¹, Mi Young Nam², Young Gui Jung², Phill Kyu

More information

Deep Learning and Its Applications

Deep Learning and Its Applications Convolutional Neural Network and Its Application in Image Recognition Oct 28, 2016 Outline 1 A Motivating Example 2 The Convolutional Neural Network (CNN) Model 3 Training the CNN Model 4 Issues and Recent

More information

Supplementary Material: Unconstrained Salient Object Detection via Proposal Subset Optimization

Supplementary Material: Unconstrained Salient Object Detection via Proposal Subset Optimization Supplementary Material: Unconstrained Salient Object via Proposal Subset Optimization 1. Proof of the Submodularity According to Eqns. 10-12 in our paper, the objective function of the proposed optimization

More information

arxiv: v1 [cs.cv] 16 Nov 2015

arxiv: v1 [cs.cv] 16 Nov 2015 Coarse-to-fine Face Alignment with Multi-Scale Local Patch Regression Zhiao Huang hza@megvii.com Erjin Zhou zej@megvii.com Zhimin Cao czm@megvii.com arxiv:1511.04901v1 [cs.cv] 16 Nov 2015 Abstract Facial

More information

JOINT INTENT DETECTION AND SLOT FILLING USING CONVOLUTIONAL NEURAL NETWORKS. Puyang Xu, Ruhi Sarikaya. Microsoft Corporation

JOINT INTENT DETECTION AND SLOT FILLING USING CONVOLUTIONAL NEURAL NETWORKS. Puyang Xu, Ruhi Sarikaya. Microsoft Corporation JOINT INTENT DETECTION AND SLOT FILLING USING CONVOLUTIONAL NEURAL NETWORKS Puyang Xu, Ruhi Sarikaya Microsoft Corporation ABSTRACT We describe a joint model for intent detection and slot filling based

More information

YOLO9000: Better, Faster, Stronger

YOLO9000: Better, Faster, Stronger YOLO9000: Better, Faster, Stronger Date: January 24, 2018 Prepared by Haris Khan (University of Toronto) Haris Khan CSC2548: Machine Learning in Computer Vision 1 Overview 1. Motivation for one-shot object

More information

Real-time Hand Tracking under Occlusion from an Egocentric RGB-D Sensor Supplemental Document

Real-time Hand Tracking under Occlusion from an Egocentric RGB-D Sensor Supplemental Document Real-time Hand Tracking under Occlusion from an Egocentric RGB-D Sensor Supplemental Document Franziska Mueller 1,2 Dushyant Mehta 1,2 Oleksandr Sotnychenko 1 Srinath Sridhar 1 Dan Casas 3 Christian Theobalt

More information

Neural Networks with Input Specified Thresholds

Neural Networks with Input Specified Thresholds Neural Networks with Input Specified Thresholds Fei Liu Stanford University liufei@stanford.edu Junyang Qian Stanford University junyangq@stanford.edu Abstract In this project report, we propose a method

More information

Image Captioning with Object Detection and Localization

Image Captioning with Object Detection and Localization Image Captioning with Object Detection and Localization Zhongliang Yang, Yu-Jin Zhang, Sadaqat ur Rehman, Yongfeng Huang, Department of Electronic Engineering, Tsinghua University, Beijing 100084, China

More information

learning stage (Stage 1), CNNH learns approximate hash codes for training images by optimizing the following loss function:

learning stage (Stage 1), CNNH learns approximate hash codes for training images by optimizing the following loss function: 1 Query-adaptive Image Retrieval by Deep Weighted Hashing Jian Zhang and Yuxin Peng arxiv:1612.2541v2 [cs.cv] 9 May 217 Abstract Hashing methods have attracted much attention for large scale image retrieval.

More information

Joint Object Detection and Viewpoint Estimation using CNN features

Joint Object Detection and Viewpoint Estimation using CNN features Joint Object Detection and Viewpoint Estimation using CNN features Carlos Guindel, David Martín and José M. Armingol cguindel@ing.uc3m.es Intelligent Systems Laboratory Universidad Carlos III de Madrid

More information

An efficient face recognition algorithm based on multi-kernel regularization learning

An efficient face recognition algorithm based on multi-kernel regularization learning Acta Technica 61, No. 4A/2016, 75 84 c 2017 Institute of Thermomechanics CAS, v.v.i. An efficient face recognition algorithm based on multi-kernel regularization learning Bi Rongrong 1 Abstract. A novel

More information

A Patch Strategy for Deep Face Recognition

A Patch Strategy for Deep Face Recognition A Patch Strategy for Deep Face Recognition Yanhong Zhang a, Kun Shang a, Jun Wang b, Nan Li a, Monica M.Y. Zhang c a Center for Applied Mathematics, Tianjin University, Tianjin 300072, P.R. China b School

More information

Structured Prediction using Convolutional Neural Networks

Structured Prediction using Convolutional Neural Networks Overview Structured Prediction using Convolutional Neural Networks Bohyung Han bhhan@postech.ac.kr Computer Vision Lab. Convolutional Neural Networks (CNNs) Structured predictions for low level computer

More information

Ensemble Soft-Margin Softmax Loss for Image Classification

Ensemble Soft-Margin Softmax Loss for Image Classification Ensemble Soft-Margin Softmax Loss for Image Classification Xiaobo Wang 1,2,, Shifeng Zhang 1,2,, Zhen Lei 1,2,, Si Liu 3, Xiaojie Guo 4, Stan Z. Li 5,1,2 1 CBSR&NLPR, Institute of Automation, Chinese Academy

More information

SEMANTIC COMPUTING. Lecture 8: Introduction to Deep Learning. TU Dresden, 7 December Dagmar Gromann International Center For Computational Logic

SEMANTIC COMPUTING. Lecture 8: Introduction to Deep Learning. TU Dresden, 7 December Dagmar Gromann International Center For Computational Logic SEMANTIC COMPUTING Lecture 8: Introduction to Deep Learning Dagmar Gromann International Center For Computational Logic TU Dresden, 7 December 2018 Overview Introduction Deep Learning General Neural Networks

More information

Real Time Monitoring of CCTV Camera Images Using Object Detectors and Scene Classification for Retail and Surveillance Applications

Real Time Monitoring of CCTV Camera Images Using Object Detectors and Scene Classification for Retail and Surveillance Applications Real Time Monitoring of CCTV Camera Images Using Object Detectors and Scene Classification for Retail and Surveillance Applications Anand Joshi CS229-Machine Learning, Computer Science, Stanford University,

More information

arxiv: v1 [cs.lg] 12 Jul 2018

arxiv: v1 [cs.lg] 12 Jul 2018 arxiv:1807.04585v1 [cs.lg] 12 Jul 2018 Deep Learning for Imbalance Data Classification using Class Expert Generative Adversarial Network Fanny a, Tjeng Wawan Cenggoro a,b a Computer Science Department,

More information

arxiv: v1 [cs.cv] 20 Dec 2016

arxiv: v1 [cs.cv] 20 Dec 2016 End-to-End Pedestrian Collision Warning System based on a Convolutional Neural Network with Semantic Segmentation arxiv:1612.06558v1 [cs.cv] 20 Dec 2016 Heechul Jung heechul@dgist.ac.kr Min-Kook Choi mkchoi@dgist.ac.kr

More information

International Journal of Computer Engineering and Applications, Volume XII, Special Issue, September 18,

International Journal of Computer Engineering and Applications, Volume XII, Special Issue, September 18, REAL-TIME OBJECT DETECTION WITH CONVOLUTION NEURAL NETWORK USING KERAS Asmita Goswami [1], Lokesh Soni [2 ] Department of Information Technology [1] Jaipur Engineering College and Research Center Jaipur[2]

More information

A Novel Image Super-resolution Reconstruction Algorithm based on Modified Sparse Representation

A Novel Image Super-resolution Reconstruction Algorithm based on Modified Sparse Representation , pp.162-167 http://dx.doi.org/10.14257/astl.2016.138.33 A Novel Image Super-resolution Reconstruction Algorithm based on Modified Sparse Representation Liqiang Hu, Chaofeng He Shijiazhuang Tiedao University,

More information

Artificial Neural Networks. Introduction to Computational Neuroscience Ardi Tampuu

Artificial Neural Networks. Introduction to Computational Neuroscience Ardi Tampuu Artificial Neural Networks Introduction to Computational Neuroscience Ardi Tampuu 7.0.206 Artificial neural network NB! Inspired by biology, not based on biology! Applications Automatic speech recognition

More information

Convolutional Neural Networks

Convolutional Neural Networks NPFL114, Lecture 4 Convolutional Neural Networks Milan Straka March 25, 2019 Charles University in Prague Faculty of Mathematics and Physics Institute of Formal and Applied Linguistics unless otherwise

More information

Cost-alleviative Learning for Deep Convolutional Neural Network-based Facial Part Labeling

Cost-alleviative Learning for Deep Convolutional Neural Network-based Facial Part Labeling [DOI: 10.2197/ipsjtcva.7.99] Express Paper Cost-alleviative Learning for Deep Convolutional Neural Network-based Facial Part Labeling Takayoshi Yamashita 1,a) Takaya Nakamura 1 Hiroshi Fukui 1,b) Yuji

More information

arxiv: v4 [cs.cv] 30 May 2018

arxiv: v4 [cs.cv] 30 May 2018 Additive Margin Softmax for Face Verification Feng Wang UESTC feng.wff@gmail.com Weiyang Liu Georgia Tech wyliu@gatech.edu Haijun Liu UESTC haijun liu@26.com Jian Cheng UESTC chengjian@uestc.edu.cn arxiv:80.05599v4

More information

Inception Network Overview. David White CS793

Inception Network Overview. David White CS793 Inception Network Overview David White CS793 So, Leonardo DiCaprio dreams about dreaming... https://m.media-amazon.com/images/m/mv5bmjaxmzy3njcxnf5bml5banbnxkftztcwnti5otm0mw@@._v1_sy1000_cr0,0,675,1 000_AL_.jpg

More information

Direct Multi-Scale Dual-Stream Network for Pedestrian Detection Sang-Il Jung and Ki-Sang Hong Image Information Processing Lab.

Direct Multi-Scale Dual-Stream Network for Pedestrian Detection Sang-Il Jung and Ki-Sang Hong Image Information Processing Lab. [ICIP 2017] Direct Multi-Scale Dual-Stream Network for Pedestrian Detection Sang-Il Jung and Ki-Sang Hong Image Information Processing Lab., POSTECH Pedestrian Detection Goal To draw bounding boxes that

More information

REGION AVERAGE POOLING FOR CONTEXT-AWARE OBJECT DETECTION

REGION AVERAGE POOLING FOR CONTEXT-AWARE OBJECT DETECTION REGION AVERAGE POOLING FOR CONTEXT-AWARE OBJECT DETECTION Kingsley Kuan 1, Gaurav Manek 1, Jie Lin 1, Yuan Fang 1, Vijay Chandrasekhar 1,2 Institute for Infocomm Research, A*STAR, Singapore 1 Nanyang Technological

More information

Convolutional Layer Pooling Layer Fully Connected Layer Regularization

Convolutional Layer Pooling Layer Fully Connected Layer Regularization Semi-Parallel Deep Neural Networks (SPDNN), Convergence and Generalization Shabab Bazrafkan, Peter Corcoran Center for Cognitive, Connected & Computational Imaging, College of Engineering & Informatics,

More information

Defense against Adversarial Attacks Using High-Level Representation Guided Denoiser SUPPLEMENTARY MATERIALS

Defense against Adversarial Attacks Using High-Level Representation Guided Denoiser SUPPLEMENTARY MATERIALS Defense against Adversarial Attacks Using High-Level Representation Guided Denoiser SUPPLEMENTARY MATERIALS Fangzhou Liao, Ming Liang, Yinpeng Dong, Tianyu Pang, Xiaolin Hu, Jun Zhu Department of Computer

More information

Deconvolutions in Convolutional Neural Networks

Deconvolutions in Convolutional Neural Networks Overview Deconvolutions in Convolutional Neural Networks Bohyung Han bhhan@postech.ac.kr Computer Vision Lab. Convolutional Neural Networks (CNNs) Deconvolutions in CNNs Applications Network visualization

More information

Measuring Aristic Similarity of Paintings

Measuring Aristic Similarity of Paintings Measuring Aristic Similarity of Paintings Jay Whang Stanford SCPD jaywhang@stanford.edu Buhuang Liu Stanford SCPD buhuang@stanford.edu Yancheng Xiao Stanford SCPD ycxiao@stanford.edu Abstract In this project,

More information

Computer Vision Lecture 16

Computer Vision Lecture 16 Computer Vision Lecture 16 Deep Learning for Object Categorization 14.01.2016 Bastian Leibe RWTH Aachen http://www.vision.rwth-aachen.de leibe@vision.rwth-aachen.de Announcements Seminar registration period

More information

Video Gesture Recognition with RGB-D-S Data Based on 3D Convolutional Networks

Video Gesture Recognition with RGB-D-S Data Based on 3D Convolutional Networks Video Gesture Recognition with RGB-D-S Data Based on 3D Convolutional Networks August 16, 2016 1 Team details Team name FLiXT Team leader name Yunan Li Team leader address, phone number and email address:

More information

Ryerson University CP8208. Soft Computing and Machine Intelligence. Naive Road-Detection using CNNS. Authors: Sarah Asiri - Domenic Curro

Ryerson University CP8208. Soft Computing and Machine Intelligence. Naive Road-Detection using CNNS. Authors: Sarah Asiri - Domenic Curro Ryerson University CP8208 Soft Computing and Machine Intelligence Naive Road-Detection using CNNS Authors: Sarah Asiri - Domenic Curro April 24 2016 Contents 1 Abstract 2 2 Introduction 2 3 Motivation

More information

Removing rain from single images via a deep detail network

Removing rain from single images via a deep detail network 207 IEEE Conference on Computer Vision and Pattern Recognition Removing rain from single images via a deep detail network Xueyang Fu Jiabin Huang Delu Zeng 2 Yue Huang Xinghao Ding John Paisley 3 Key Laboratory

More information

COMP9444 Neural Networks and Deep Learning 7. Image Processing. COMP9444 c Alan Blair, 2017

COMP9444 Neural Networks and Deep Learning 7. Image Processing. COMP9444 c Alan Blair, 2017 COMP9444 Neural Networks and Deep Learning 7. Image Processing COMP9444 17s2 Image Processing 1 Outline Image Datasets and Tasks Convolution in Detail AlexNet Weight Initialization Batch Normalization

More information

Online Japanese Character Recognition Using Trajectory-Based Normalization and Direction Feature Extraction

Online Japanese Character Recognition Using Trajectory-Based Normalization and Direction Feature Extraction Online Japanese Character Recognition Using Trajectory-Based Normalization and Direction Feature Extraction Cheng-Lin Liu, Xiang-Dong Zhou To cite this version: Cheng-Lin Liu, Xiang-Dong Zhou. Online Japanese

More information

Rare Chinese Character Recognition by Radical Extraction Network

Rare Chinese Character Recognition by Radical Extraction Network 2017 IEEE International Conference on Systems, Man, and Cybernetics (SMC) Banff Center, Banff, Canada, October 5-8, 2017 Rare Chinese Character Recognition by Radical Extraction Network Ziang Yan, Chengzhe

More information

Facial Expression Classification with Random Filters Feature Extraction

Facial Expression Classification with Random Filters Feature Extraction Facial Expression Classification with Random Filters Feature Extraction Mengye Ren Facial Monkey mren@cs.toronto.edu Zhi Hao Luo It s Me lzh@cs.toronto.edu I. ABSTRACT In our work, we attempted to tackle

More information

Application of Convolutional Neural Network for Image Classification on Pascal VOC Challenge 2012 dataset

Application of Convolutional Neural Network for Image Classification on Pascal VOC Challenge 2012 dataset Application of Convolutional Neural Network for Image Classification on Pascal VOC Challenge 2012 dataset Suyash Shetty Manipal Institute of Technology suyash.shashikant@learner.manipal.edu Abstract In

More information

FUSION MODEL BASED ON CONVOLUTIONAL NEURAL NETWORKS WITH TWO FEATURES FOR ACOUSTIC SCENE CLASSIFICATION

FUSION MODEL BASED ON CONVOLUTIONAL NEURAL NETWORKS WITH TWO FEATURES FOR ACOUSTIC SCENE CLASSIFICATION Please contact the conference organizers at dcasechallenge@gmail.com if you require an accessible file, as the files provided by ConfTool Pro to reviewers are filtered to remove author information, and

More information

A Feature Selection Method to Handle Imbalanced Data in Text Classification

A Feature Selection Method to Handle Imbalanced Data in Text Classification A Feature Selection Method to Handle Imbalanced Data in Text Classification Fengxiang Chang 1*, Jun Guo 1, Weiran Xu 1, Kejun Yao 2 1 School of Information and Communication Engineering Beijing University

More information

FACIAL POINT DETECTION BASED ON A CONVOLUTIONAL NEURAL NETWORK WITH OPTIMAL MINI-BATCH PROCEDURE. Chubu University 1200, Matsumoto-cho, Kasugai, AICHI

FACIAL POINT DETECTION BASED ON A CONVOLUTIONAL NEURAL NETWORK WITH OPTIMAL MINI-BATCH PROCEDURE. Chubu University 1200, Matsumoto-cho, Kasugai, AICHI FACIAL POINT DETECTION BASED ON A CONVOLUTIONAL NEURAL NETWORK WITH OPTIMAL MINI-BATCH PROCEDURE Masatoshi Kimura Takayoshi Yamashita Yu Yamauchi Hironobu Fuyoshi* Chubu University 1200, Matsumoto-cho,

More information

Deep Learning for Face Recognition. Xiaogang Wang Department of Electronic Engineering, The Chinese University of Hong Kong

Deep Learning for Face Recognition. Xiaogang Wang Department of Electronic Engineering, The Chinese University of Hong Kong Deep Learning for Face Recognition Xiaogang Wang Department of Electronic Engineering, The Chinese University of Hong Kong Deep Learning Results on LFW Method Accuracy (%) # points # training images Huang

More information

Convolution Neural Network for Traditional Chinese Calligraphy Recognition

Convolution Neural Network for Traditional Chinese Calligraphy Recognition Convolution Neural Network for Traditional Chinese Calligraphy Recognition Boqi Li Mechanical Engineering Stanford University boqili@stanford.edu Abstract script. Fig. 1 shows examples of the same TCC

More information

Volumetric and Multi-View CNNs for Object Classification on 3D Data Supplementary Material

Volumetric and Multi-View CNNs for Object Classification on 3D Data Supplementary Material Volumetric and Multi-View CNNs for Object Classification on 3D Data Supplementary Material Charles R. Qi Hao Su Matthias Nießner Angela Dai Mengyuan Yan Leonidas J. Guibas Stanford University 1. Details

More information

A Touching Character Database from Chinese Handwriting for Assessing Segmentation Algorithms

A Touching Character Database from Chinese Handwriting for Assessing Segmentation Algorithms 2012 International Conference on Frontiers in Handwriting Recognition A Touching Character Database from Chinese Handwriting for Assessing Segmentation Algorithms Liang Xu, Fei Yin, Qiu-Feng Wang, Cheng-Lin

More information

Dynamic Routing Between Capsules

Dynamic Routing Between Capsules Report Explainable Machine Learning Dynamic Routing Between Capsules Author: Michael Dorkenwald Supervisor: Dr. Ullrich Köthe 28. Juni 2018 Inhaltsverzeichnis 1 Introduction 2 2 Motivation 2 3 CapusleNet

More information

arxiv: v2 [cs.cv] 30 Oct 2018

arxiv: v2 [cs.cv] 30 Oct 2018 Adversarial Noise Layer: Regularize Neural Network By Adding Noise Zhonghui You, Jinmian Ye, Kunming Li, Zenglin Xu, Ping Wang School of Electronics Engineering and Computer Science, Peking University

More information

RSRN: Rich Side-output Residual Network for Medial Axis Detection

RSRN: Rich Side-output Residual Network for Medial Axis Detection RSRN: Rich Side-output Residual Network for Medial Axis Detection Chang Liu, Wei Ke, Jianbin Jiao, and Qixiang Ye University of Chinese Academy of Sciences, Beijing, China {liuchang615, kewei11}@mails.ucas.ac.cn,

More information

HCL2000 A Large-scale Handwritten Chinese Character Database for Handwritten Character Recognition

HCL2000 A Large-scale Handwritten Chinese Character Database for Handwritten Character Recognition 2009 10th International Conference on Document Analysis and Recognition HCL2000 A Large-scale Handwritten Chinese Character Database for Handwritten Character Recognition Honggang Zhang, Jun Guo, Guang

More information

3D Densely Convolutional Networks for Volumetric Segmentation. Toan Duc Bui, Jitae Shin, and Taesup Moon

3D Densely Convolutional Networks for Volumetric Segmentation. Toan Duc Bui, Jitae Shin, and Taesup Moon 3D Densely Convolutional Networks for Volumetric Segmentation Toan Duc Bui, Jitae Shin, and Taesup Moon School of Electronic and Electrical Engineering, Sungkyunkwan University, Republic of Korea arxiv:1709.03199v2

More information

IDENTIFYING PHOTOREALISTIC COMPUTER GRAPHICS USING CONVOLUTIONAL NEURAL NETWORKS

IDENTIFYING PHOTOREALISTIC COMPUTER GRAPHICS USING CONVOLUTIONAL NEURAL NETWORKS IDENTIFYING PHOTOREALISTIC COMPUTER GRAPHICS USING CONVOLUTIONAL NEURAL NETWORKS In-Jae Yu, Do-Guk Kim, Jin-Seok Park, Jong-Uk Hou, Sunghee Choi, and Heung-Kyu Lee Korea Advanced Institute of Science and

More information

CENG 783. Special topics in. Deep Learning. AlchemyAPI. Week 11. Sinan Kalkan

CENG 783. Special topics in. Deep Learning. AlchemyAPI. Week 11. Sinan Kalkan CENG 783 Special topics in Deep Learning AlchemyAPI Week 11 Sinan Kalkan TRAINING A CNN Fig: http://www.robots.ox.ac.uk/~vgg/practicals/cnn/ Feed-forward pass Note that this is written in terms of the

More information

A Sparse and Locally Shift Invariant Feature Extractor Applied to Document Images

A Sparse and Locally Shift Invariant Feature Extractor Applied to Document Images A Sparse and Locally Shift Invariant Feature Extractor Applied to Document Images Marc Aurelio Ranzato Yann LeCun Courant Institute of Mathematical Sciences New York University - New York, NY 10003 Abstract

More information

Face Recognition by Combining Kernel Associative Memory and Gabor Transforms

Face Recognition by Combining Kernel Associative Memory and Gabor Transforms Face Recognition by Combining Kernel Associative Memory and Gabor Transforms Author Zhang, Bai-ling, Leung, Clement, Gao, Yongsheng Published 2006 Conference Title ICPR2006: 18th International Conference

More information

Cursive Handwriting Recognition System Using Feature Extraction and Artificial Neural Network

Cursive Handwriting Recognition System Using Feature Extraction and Artificial Neural Network Cursive Handwriting Recognition System Using Feature Extraction and Artificial Neural Network Utkarsh Dwivedi 1, Pranjal Rajput 2, Manish Kumar Sharma 3 1UG Scholar, Dept. of CSE, GCET, Greater Noida,

More information

Human Action Recognition Using CNN and BoW Methods Stanford University CS229 Machine Learning Spring 2016

Human Action Recognition Using CNN and BoW Methods Stanford University CS229 Machine Learning Spring 2016 Human Action Recognition Using CNN and BoW Methods Stanford University CS229 Machine Learning Spring 2016 Max Wang mwang07@stanford.edu Ting-Chun Yeh chun618@stanford.edu I. Introduction Recognizing human

More information

Densely Connected Bidirectional LSTM with Applications to Sentence Classification

Densely Connected Bidirectional LSTM with Applications to Sentence Classification Densely Connected Bidirectional LSTM with Applications to Sentence Classification Zixiang Ding 1, Rui Xia 1(B), Jianfei Yu 2,XiangLi 1, and Jian Yang 1 1 School of Computer Science and Engineering, Nanjing

More information