Handwritten Chinese Character Recognition by Joint Classification and Similarity Ranking

Similar documents
Channel Locality Block: A Variant of Squeeze-and-Excitation

A FRAMEWORK OF EXTRACTING MULTI-SCALE FEATURES USING MULTIPLE CONVOLUTIONAL NEURAL NETWORKS. Kuan-Chuan Peng and Tsuhan Chen

Proceedings of the International MultiConference of Engineers and Computer Scientists 2018 Vol I IMECS 2018, March 14-16, 2018, Hong Kong

HENet: A Highly Efficient Convolutional Neural. Networks Optimized for Accuracy, Speed and Storage

FaceNet. Florian Schroff, Dmitry Kalenichenko, James Philbin Google Inc. Presentation by Ignacio Aranguren and Rahul Rana

Convolution Neural Networks for Chinese Handwriting Recognition

Robust Face Recognition Based on Convolutional Neural Network

Content-Based Image Recovery

Dual Learning of the Generator and Recognizer for Chinese Characters

Deep Neural Networks for Recognizing Online Handwritten Mathematical Symbols

Supplementary material for Analyzing Filters Toward Efficient ConvNet

Kaggle Data Science Bowl 2017 Technical Report

CEA LIST s participation to the Scalable Concept Image Annotation task of ImageCLEF 2015

Real-time Object Detection CS 229 Course Project

MULTI-VIEW FEATURE FUSION NETWORK FOR VEHICLE RE- IDENTIFICATION

Deep Learning for Computer Vision II

Deep Convolutional Neural Network using Triplet of Faces, Deep Ensemble, and Scorelevel Fusion for Face Recognition

Handwriting Character Recognition as a Service:A New Handwriting Recognition System Based on Cloud Computing

Study of Residual Networks for Image Recognition

Convolutional Neural Networks. Computer Vision Jia-Bin Huang, Virginia Tech

Learning-Based Candidate Segmentation Scoring for Real-Time Recognition of Online Overlaid Chinese Handwriting

Face Recognition A Deep Learning Approach

Feature-Fused SSD: Fast Detection for Small Objects

Elastic Neural Networks for Classification

MULTI-SCALE OBJECT DETECTION WITH FEATURE FUSION AND REGION OBJECTNESS NETWORK. Wenjie Guan, YueXian Zou*, Xiaoqun Zhou

Clustering and Unsupervised Anomaly Detection with l 2 Normalized Deep Auto-Encoder Representations

Deep Learning with Tensorflow AlexNet

Learning to Match. Jun Xu, Zhengdong Lu, Tianqi Chen, Hang Li

arxiv: v1 [cs.cv] 6 Jul 2016

Based on improved STN-CNN facial expression recognition

3D model classification using convolutional neural network

Perceptron: This is convolution!

LARGE-SCALE PERSON RE-IDENTIFICATION AS RETRIEVAL

Multi-Glance Attention Models For Image Classification

Cross-domain Deep Encoding for 3D Voxels and 2D Images

Computer Vision Lecture 16

Computer Vision Lecture 16

Joint Unsupervised Learning of Deep Representations and Image Clusters Supplementary materials

Automatic detection of books based on Faster R-CNN

Learning image representations equivariant to ego-motion (Supplementary material)

MCMOT: Multi-Class Multi-Object Tracking using Changing Point Detection

Deep Learning and Its Applications

Supplementary Material: Unconstrained Salient Object Detection via Proposal Subset Optimization

arxiv: v1 [cs.cv] 16 Nov 2015

JOINT INTENT DETECTION AND SLOT FILLING USING CONVOLUTIONAL NEURAL NETWORKS. Puyang Xu, Ruhi Sarikaya. Microsoft Corporation

YOLO9000: Better, Faster, Stronger

Real-time Hand Tracking under Occlusion from an Egocentric RGB-D Sensor Supplemental Document

Neural Networks with Input Specified Thresholds

Image Captioning with Object Detection and Localization

learning stage (Stage 1), CNNH learns approximate hash codes for training images by optimizing the following loss function:

Joint Object Detection and Viewpoint Estimation using CNN features

An efficient face recognition algorithm based on multi-kernel regularization learning

A Patch Strategy for Deep Face Recognition

Structured Prediction using Convolutional Neural Networks

Ensemble Soft-Margin Softmax Loss for Image Classification

SEMANTIC COMPUTING. Lecture 8: Introduction to Deep Learning. TU Dresden, 7 December Dagmar Gromann International Center For Computational Logic

Real Time Monitoring of CCTV Camera Images Using Object Detectors and Scene Classification for Retail and Surveillance Applications

arxiv: v1 [cs.lg] 12 Jul 2018

arxiv: v1 [cs.cv] 20 Dec 2016

International Journal of Computer Engineering and Applications, Volume XII, Special Issue, September 18,

A Novel Image Super-resolution Reconstruction Algorithm based on Modified Sparse Representation

Artificial Neural Networks. Introduction to Computational Neuroscience Ardi Tampuu

Convolutional Neural Networks

Cost-alleviative Learning for Deep Convolutional Neural Network-based Facial Part Labeling

arxiv: v4 [cs.cv] 30 May 2018

Inception Network Overview. David White CS793

Direct Multi-Scale Dual-Stream Network for Pedestrian Detection Sang-Il Jung and Ki-Sang Hong Image Information Processing Lab.

REGION AVERAGE POOLING FOR CONTEXT-AWARE OBJECT DETECTION

Convolutional Layer Pooling Layer Fully Connected Layer Regularization

Defense against Adversarial Attacks Using High-Level Representation Guided Denoiser SUPPLEMENTARY MATERIALS

Deconvolutions in Convolutional Neural Networks

Measuring Aristic Similarity of Paintings

Computer Vision Lecture 16

Video Gesture Recognition with RGB-D-S Data Based on 3D Convolutional Networks

Ryerson University CP8208. Soft Computing and Machine Intelligence. Naive Road-Detection using CNNS. Authors: Sarah Asiri - Domenic Curro

Removing rain from single images via a deep detail network

COMP9444 Neural Networks and Deep Learning 7. Image Processing. COMP9444 c Alan Blair, 2017

Online Japanese Character Recognition Using Trajectory-Based Normalization and Direction Feature Extraction

Rare Chinese Character Recognition by Radical Extraction Network

Facial Expression Classification with Random Filters Feature Extraction

Application of Convolutional Neural Network for Image Classification on Pascal VOC Challenge 2012 dataset

FUSION MODEL BASED ON CONVOLUTIONAL NEURAL NETWORKS WITH TWO FEATURES FOR ACOUSTIC SCENE CLASSIFICATION

A Feature Selection Method to Handle Imbalanced Data in Text Classification

FACIAL POINT DETECTION BASED ON A CONVOLUTIONAL NEURAL NETWORK WITH OPTIMAL MINI-BATCH PROCEDURE. Chubu University 1200, Matsumoto-cho, Kasugai, AICHI

Deep Learning for Face Recognition. Xiaogang Wang Department of Electronic Engineering, The Chinese University of Hong Kong

Convolution Neural Network for Traditional Chinese Calligraphy Recognition

Volumetric and Multi-View CNNs for Object Classification on 3D Data Supplementary Material

A Touching Character Database from Chinese Handwriting for Assessing Segmentation Algorithms

Dynamic Routing Between Capsules

arxiv: v2 [cs.cv] 30 Oct 2018

RSRN: Rich Side-output Residual Network for Medial Axis Detection

HCL2000 A Large-scale Handwritten Chinese Character Database for Handwritten Character Recognition

3D Densely Convolutional Networks for Volumetric Segmentation. Toan Duc Bui, Jitae Shin, and Taesup Moon

IDENTIFYING PHOTOREALISTIC COMPUTER GRAPHICS USING CONVOLUTIONAL NEURAL NETWORKS

CENG 783. Special topics in. Deep Learning. AlchemyAPI. Week 11. Sinan Kalkan

A Sparse and Locally Shift Invariant Feature Extractor Applied to Document Images

Face Recognition by Combining Kernel Associative Memory and Gabor Transforms

Cursive Handwriting Recognition System Using Feature Extraction and Artificial Neural Network

Human Action Recognition Using CNN and BoW Methods Stanford University CS229 Machine Learning Spring 2016

Densely Connected Bidirectional LSTM with Applications to Sentence Classification

Transcription:

2016 15th International Conference on Frontiers in Handwriting Recognition Handwritten Chinese Character Recognition by Joint Classification and Similarity Ranking Cheng Cheng, Xu-Yao Zhang, Xiao-Hu Shao and Xiang-Dong Zhou Chongqing Institute of Green and Intelligent Technology, Chinese Academy of Sciences Institute of Automation, Chinese Academy of Sciences Email: {chengcheng, shaoxiaohu, zhouxiangdong}@cigit.ac.cn, xyz@nlpr.ia.ac.cn Abstract Deep convolutional neural networks (DCNN) have recently achieved state-of-the-art performance on handwritten Chinese character recognition (HCCR). However, most of DCNN models employ the softmax activation function and minimize cross-entropy loss, which may loss some inter-class information. To cope with this problem, we demonstrate a small but consistent advantage of using both classification and similarity ranking signals as supervision. Specifically, the presented method learns a DCNN model by maximizing the inter-class variations and minimizing the intra-class variations, and simultaneously minimizing the cross-entropy loss. In addition, we also review some loss functions for similarity ranking and evaluate their performance. Our experiments demonstrate that the presented method achieves state-of-the-art accuracy on the well-known ICDAR 2013 offline HCCR competition dataset. Index Terms Similarity Ranking Character Recognition Deep Convolutional Neural Networks Handwritten Chinese character recognition (HCCR) has been intensively studied in the past forty years and is of practical importance for bank check reading, taxform processing, book and handwritten notes transcription, and so on. Although many studies have been conducted, it remains a challenging problem due to the diversity of writing styles, large character set and the presence of many confusing character pairs. Some samples with different writing styles and confusing character pairs are show in Fig. 1 and Fig. 2, respectively. Existing HCCR methods can be mainly classified into two categories: Traditional methods and DCNN based methods. In the first category, there are typically four basic steps: shape normalization, feature extraction, dimensionality reduction and classifier construction. To improve the recognition performance, many effective methods, include nonlinear normalization [17], pseudo two dimensional normalization [14], gradient direction feature extraction [13], modified quadratic discriminant function [12] and discriminative learning quadratic discriminant function [15], have been proposed for these steps. In the second category, a DCNN model composed of layers of convolution, rectification and pooling is trained via back propagation. Unlike traditional methods, they substitute the separate steps, i.e. feature extraction, dimensionality reduction and classifier construction with a single deep architecture and only require shape normalization in the four steps. These expressivity and robust training algorithms allow for learning powerful object representations without the need of handcrafted features. However, most of DCNN based methods use the softmax activation function (also known as multinomial logistic regression) for classification, which we find may loss some inter-class information. Fig. 1: Characters with different writting styles. I. INTRODUCTION Fig. 2: Examples of confused character pairs. In this paper, we contribute to the second category and present a deep triplet network (DTN) method of which the basic idea is illustrated in Fig. 3. Unlike most existing methods, the presented method learns a DCNN model using both classification and similarity ranking signals as supervision. Classification is to classify an input image into a large number of identity classes, while similarity ranking is to minimize the intra-class distance while maximizing the inter-class distance. In addition, we also investigate the loss functions of similarity ranking algorithms and aim to improve the performance using a new form of loss function. CNN fc softmax Triplet Ranking Fig. 3: The structure of the proposed model. The rest of this paper is organized as follows: Section II briefly introduces the related previous work Section III re- 2167-6445/16 $31.00 2016 IEEE DOI 10.1109/ICFHR.2016.92 506 507

views the softmax and similarity ranking Section IV presents the loss function for similarity ranking Section V presents our experimental results and the last section concludes the paper. II. RELATED WORK A. HCCR by DCNN In recent years, DCNN has received increasing interests in computer vision and machine learning, a number of DCNN methods have been proposed for HCCR in the literatures [2], [3], [4], [21], and continued their success by winning both online and offline HCCR competitions at the ICDAR 2013 [23]. Generally, DCNN aims to learn hierarchical feature representations by building high-level features from lowlevel ones. There are two notable breakthroughs. The first is large-scale character classification with DCNN [25], [26]. Meanwhile, the domain-specific knowledge, such as Gabor or normalization-cooperated direction-decomposed feature map, is used for enhancing the performance of DCNN. The second is supervised DCNN with both character reconstruction and verification tasks [2]. The reconstruction task minimizes the distance between features of the same category. In this paper, we extend DCNN model using classification and similarity ranking signals as supervision. B. Similarity Ranking The present method falls under the big umbrella of similarity ranking. Similarity ranking based DCNN methods has been proved effective in a wide range of tasks, such as face recognition [10], [18], person re-identification [5], [24] and image retrieval [20]. The framework of the above mentioned papers is to organize the training images into a batch of triplet samples, each sample containing two images with the same label and one with a different label. With these triplet samples, they tend to minimize the intra-class distance while maximizing the inter-class distance for each triplet unit using Euclidean distance metric. In the field of character recognition, the closest method to our approach is the discriminative DCNN by Kim et al. [11]. The discriminative DCNN focuses on the differences among similar classes, and thereby improves the discrimination ability of the DCNN. III. METHODOLOGY We aim to train a DCNN model using both classification and similarity ranking signals as supervision. The first is character classification signal, which classifies each character image into n (e.g., n = 3755) different categories. It is achieved by following the fully connected layer with an n-way softmax layer, which outputs a probability distribution over the n classes. The network is trained to minimize the cross-entropy loss, which is denoted as n L(f i,k,θ cls )= p i log p i (1) i=1 in which k is a true class label, L(f i,k,θ cls ) is the standard cross-entropy/log loss, and θ cls denotes the softmax layer parameters. p i is the target probability distribution, where p i = 0 for all i except p i = 1 for the target class k. The second is similarity ranking signals, which project character pairs into the same feature subspace. The distance of each positive character pair is less than a smaller threshold and that of each negative pair is higher than a larger threshold, respectively. We adopt the following loss function, which was originally proposed by Wang et al [20] and widely used in face recognition [18], person re-identification [5] and image retrieval [24]. L(f i,f j,f k,θ tri )=max( f i f j 2 2 f i f k 2 2+Δ, 0) (2) in which f i,f j are features of two character images have the same label, f i,f k are features of two mismatched character images, Δ is a margin that is enforced between positive and negative image pairs, and θ tri is the parameter to be learned in the similarity ranking loss function. All the two loss functions are evaluated and compared in our experiments. Our goal is to learn the parameters θ con in the DCNN model, while θ tri and θ cls are parameters introduced to propagate the classification and similarity ranking signals during training. In the testing stage, only θ cls and θ con are used for classification. The parameters are updated by stochastic gradient descent on each triplet unit. The gradients of two supervisory signals are weighted by a hyperparameter λ. Our learning algorithm is summarized in Algorithm 1. Algorithm 1 The learning algorithm Require: training set χ = {x i,y i }, initialized parameters θ cls, θ con and θ tri, hyperparameter λ Ensure: parameters θ cls, θ con and θ tri 1: for m =1to iter do 2: sample a triplet units (x i,y i ), (x j,y j ) and (x k,y k ) from χ, in which x i,x j have the same label 3: f i = Conv(x i,θ con ), f j = Conv(x j,θ con ), f k = Conv(x k,θ con ) 4: θ cls = L(fi,yi,θ cls) + L(fj,yj,θ cls) + L(f k,y k,θ cls ) 5: θ tri = λ L(fi,fj,f k,θ tri) 6: f i = L(fi,yi,θ cls) 7: f j = L(fj,yj,θ cls) 8: f k = L(f k,y k,θ cls ) θ tri 9: θ con = Conv(xi,θcon) Conv(x k,θ con) 10: end for + Conv(xj,θcon) + IV. LOSS FUNCTION FOR SIMILARITY RANKING In this section, we describe and compare three different loss functions, which can be used in the proposed framework. 507 508

A. Euclidean Distance In the absence of prior knowledge, most similarity ranking use simple Euclidean distance to measure the dissimilarities between examples represented as vector inputs. The cost function over the distance metrics parameterized by eq. 2 has two competing terms. The first term penalizes large distances between each input and its target neighbors, while the second term penalizes small distances between each input and all other inputs that do not share the same label. Specifically, the cost function is given by: N [ ] L = f i f j 2 2 f i f k 2 2 +Δ (3) where N is the number of triplet. It is easy to calculate the derivative of the loss with respective to characters as: L = f i f j 2 f i f k 2 L = f i f j 2 (4) L = f i f k 2 B. Logistic Discriminant Based The Euclidean distance is sensitive to the scale, and is blind to the correlation across dimensions. To overcome the deficiency of Euclidean distance, we use a standard linear logistic discriminant function to model the triplet units as: 1 p n =1 = σ(d) (5) 1+e d in which d p = f i f j 2 2, d n = f i f k 2 2 and d = d p d n. We model the probability p n that triplet n =(i, j, k) is positive (fulfill the constraint in Eq. 2). If d<0, it is misclassified, we use maximum log-likelihood to optimize the parameters of the model. The log-likelihood L can be written as: N [ ] L = t n ln p n +(1 t n ) ln(1 p n ) (6) C. Conditional Log-likelihood Loss In [9], the generalization of limitations of the above loss are discussed, and a regularization term is added for the above loss function to avoid over-fitting in training as well as to maximize the hypothesis margin. Follow [9], we rewrite the loss function as: p n = σ(d)+α f i f j 2 2 (7) where α is the regularization coefficient. Intuitively, the regularizer pays more attention to the intra-class variations. V. EXPERIMENTS To verify the effectiveness of presented method we conduct experiments on the offline HCCR databases [16], including D- B1.0, DB1.1 and test set of ICDAR-2013 Chinese handwriting recognition competition [23] (denoted as ICDAR-2013). Fig. 4: The network architecture of the presented method. A. Implementation Details We implement the present methods using the caffe [8] with our modifications. All experiments are run on four GPU. All the models are trained based on the same implementation as follows. 1) Data Augmentation: During training, the 128 128 image is perturbed by the single model or the combined model, as in [1]. A half of random samples are flipped horizontally. We also adopt some augmentations that were proposed by the previous work [22], such as add a random integer ranging from 20 to +20 to the image, grey shifting, Gaussian blur, and so on. 2) Network and Settings: Deep residual networks and Inception architecture were independently proposed in [6] and [19]. Both of them achieved high performance in ImageNet challenges. Integrated the tricks of these two papers, we design an architecture as show in Fig. 4. It consists of 2 convolutional layers, 9 Inception layers, 5 pooling layers, 2 fully connected layer, 1 similarity loss layer and 1 softmax loss layer. The first four pooling layers use max operator and the last pooling layer is average. The outputs of 9 Inception layers, are added to the 508 509

TABLE I: Recognition rates (%) on DB1.1 and ICDAR-2013 trained with DB1.1 loss function DB1.1 ICDAR-2013 top-1 top-10 top-1 top-10 softmax 95.72 99.43 96.07 99.59 softmax + similarity ranking (UD) 95.76 99.54 96.27 99.71 softmax + similarity ranking (LD) 95.85 99.55 96.22 99.69 softmax + similarity ranking (CLL) 96.13 99.58 96.44 99.71 TABLE II: Recognition rates (%) on ICDAR-2013 trained with DB1.0 and DB1.1 loss function Ensemble method top-1 top-10 Dic.size softmax no GoogleNet [26] 96.35 99.80 27.77MB softmax no directmap [25] 97.37 NULL 23.50MB softmax no ours 96.50 99.78 36.20MB softmax + CLL no ours 97.07 99.85 36.20M softmax + CLL yes (4) ours 97.64 99.91 144.80M outputs of the last fc layer. Follow [7], the batch normalization is used after each convolution and before ReLU activation. We train the DCNN models using SGD with a mini-batch size of 360. The learning rate is set to 5e-2 initially and reduce to 1e-5 gradually. The models are initialized from random from zero-mean Gaussian distributions, and trained on four Titan X GPU for 300 hours. B. Results on DB1.1 and ICDAR-2013 In this experiment, we used the 240 writers (no.1001-1240) of DB1.1 databases for training, the test dataset from remain 40 writers and the 2013 ICDAR Chinese handwriting recognition competition, respectively. In Section III, we introduced the method to training a DCNN model using both classification and similarity ranking signals as supervision, and three loss function for similarity ranking, namely, Euclidean distance (Ed), Logistic Discriminant Based (LD) and Conditional loglike lihoodloss (CLL). The recognition results on DB1.1 and ICDAR-2013 test set are shown in Table I. First, compared to the results of baseline DCNN method using only softmax function in Table I, we can see that the recognition accuracy is improved further by combined two type of signals, especially the similarity ranking with CLL loss function. This demonstrates the benefits of the proposed method, improving top 1 recognition rate from 96.07 percent to 96.44 percent. Next, we compare the results of three loss function for similarity ranking. Table I shows that the results of CLL methods are better than those of ED and LD for similarity ranking. C. Comparison with other State-of-the-art Methods In this experiment, we used the DB1.0 and DB1.1 [16] databases for training, and ICDAR-2013 for testing. To show the outstanding performance of the proposed method, we compare the performances with HCCR [26] which reports very good results on the ICDAR-2013 datasets. The recognition results on ICDAR-2013 test set are shown in Table II. First, compared to the recognition results of GoogleNet and presented network architecture with same loss function from Table II. It is observed that the presented architecture yield higher recognition rate than the basedline GoogleNet architecture. Next, compared with the state-of-the-art result of the previous working [25], our method achieved a significant improvement with a relative 19.45% error rate reduction. It is worth noting that the memory cost of the presented model is 36.20MB, which is bigger than the baseline GoogleNet model (27.77MB) and the directmap model (23.50MB). That is because the proposed model combine of all Inception layers with their output filter banks concatenated into a fully connected layer. VI. CONCLUSION This paper shows that the character classification and similarity ranking supervisory signals are complementary for each other, which can increase inter-class variations and reduce intra-class variations, and therefore much better classification performance can be achieved. Combination of the two supervisory signals leads to significantly better results than only softmax based character classification. Experiments on the ICDAR 2013 offline HCCR competition dataset show that our best result is superior to all previous works. The best testing error rate we achieved is 2.36%, which is a new state-of-the-art record according to our knowledge. ACKNOWLEDGMENT This work is supported by the National Natural Science Foundation of China under Grants Nos. 61472386, 61403380 and Chongqing Research Program of Basic Research and Frontier Technology (No. cstc2016jcyja0011). The two Titan X GPUs used for this research were donated by the NVIDIA Corporation. REFERENCES [1] B. Chen, B. Zhu, and M. Nakagawa. Training of an on-line handwritten japanese character recognizer by artificial patterns. Pattern Recognition Letters, 35:178 185, 2014. [2] L. Chen, S. Wang, S. Wang, J. Sun, and J. Sun. Reconstruction combined training for convolutional neural networks on character recognition. In International Conference on Document Analysis and Recognition, pages 431 435, 2015. [3] C. Dan, U. Meier, and J. Schmidhuber. Multi-column deep neural networks for image classification. In International Conference on Computer Vision and Pattern Recognition, pages 3642 3649, 2012. 509 510

[4] C. Dan and J. Schmidhuber. Multi-column deep neural networks for offline handwritten chinese character classification. In arxiv, 2013. [5] S. Ding, L. Lin, G. Wang, and H. Chao. Deep feature learning with relative distance comparison for person re-identification. Pattern Recognition, 48(1):2993 3003, 2015. [6] K. He, X. Zhang, X. Zhang, and X. Zhang. Deep residual learning for image recognition. In arxiv, 2015. [7] S. Ioffe and C. Szegedy. Batch normalization: Accelerating deep network training by reducing internal covariate shift. In International Conference on Machine Learning, 2015. [8] Y. Jia, E. Shelhamer, J. Donahue, S. Karayev, J. Long, R. Girshick, S. Guadarrama, and T. Darrell. Caffe: Convolutional architecture for fast feature embedding. In ACM International Conference on Multimedia, pages 675 678, 2014. [9] X.-B. Jin, C.-L. Liu, and X. Hou. Regularized margin-based conditional log-likelihood loss for prototype learning. Pattern Recognition, 43(7):2428 2438, 2010. [10] L. Jinguo, Y. Deng, and C. Huang. Targeting ultimate accuracy: Face recognition via deep embedding. In arxiv, 2015. [11] I.-J. Kim, C. Choi, and C. Choi. Improving discrimination ability of convolutional neural networks by hybrid learning. IJDAR, 19(1):1 9, 2016. [12] F. Kimura, K. Takashina, S. Tsuruoka, and Y. Miyake. Modified quadratic discriminant functions and the application to chinese character recognition. IEEE Transactions on Pattern Analysis and Machine Intelligence, 9(1):149 153, 1987. [13] C.-L. Liu. Normalization-cooperated gradient feature extraction for handwritten character recognition. IEEE Transactions on Pattern Analysis and Machine Intelligence, 29(8):1465 1469, 2007. [14] C.-L. Liu and K. Marukawa. Pseudo two-dimensional shape normalization methods for handwritten chinese character recognition. Pattern Recognition, 38(12):2242 2255, 2005. [15] C.-L. Liu, H. Sako, and H. Fujisawa. Discriminative learning quadratic discriminant function for handwriting recognition. IEEE Transactions on Neural Networks, 15(2):430 444, 2004. [16] C.-L. Liu, F. Yin, D.-H. Wang, and Q.-F. Wang. Online and offline handwritten chinese character recognition: Benchmarking on new databases. Pattern Recognition, 46(1):155 162, 2012. [17] T. V. Phan, J. Gao, B. Zhu, and M. Nakagawa. Effects of line densities on nonlinear normalization for online handwritten japanese character recognition. In International Conference on Document Analysis and Recognition, pages 834 838, 2011. [18] F. Schroff, D. Kalenichenko, and J. Philbin. Facenet: A unified embedding for face recognition and clustering. In International Conference on Computer Vision and Pattern Recognition, pages 815 823, 2015. [19] C. Szegedy, V. Vanhoucke, S. Ioffe, and J. Shlens. Rethinking the inception architecture for computer vision. In arxiv, 2015. [20] J. Wang, J. Wang, J. Wang, J. Wang, J. Wang, J. Wang, J. Wang, and Y. Wu. Learning fine-grained image similarity with deep ranking. In International Conference on Computer Vision and Pattern Recognition, pages 1386 1393, 2014. [21] C. Wu, W. Fan, Y. He, J. Sun, and S. Naoi. Handwritten character recognition by alternately trained relaxation convolutional neural network. In International Conference on Frontiers in Handwriting Recognition, pages 291 296, 2014. [22] R. Wu, S. Yan, Y. Shan, Q. Dang, and G. Sun. Deep image: Scaling up image recognition. In arxiv, 2015. [23] F. Yin, Q.-F. Wang, X.-Y. Zhang, and C.-L. Liu. Icdar 2013 chinese handwriting recognition competition. In International Conference on Document Analysis and Recognition, pages 1464 1470, 2013. [24] R. Zhang, L. Lin, R. Zhang, W. Zuo, and L. Zhang. Bit-scalable deep hashing with regularized similarity learning for image retrieval and person re-identification. IEEE Transa Image Processing, 24(12):4766 4779, 2015. [25] X.-Y. Zhang, Y. Bengio, and C.-L. Liu. Online and offline handwritten chinese character recognition: A comprehensive study and new benchmark. Accepted by Pattern Recognition, 2016. [26] Z. Zhong, L. Jin, and Z. Xie. High performance offline handwritten chinese character recognition using googlenet and directional feature maps. In International Conference on Document Analysis and Recognition, pages 846 850, 2015. 510 511