OPTICAL Character Recognition systems aim at converting

Similar documents
Boosting face recognition via neural Super-Resolution

Bidirectional Recurrent Convolutional Networks for Video Super-Resolution

Single Image Super Resolution of Textures via CNNs. Andrew Palmer

A Novel Multi-Frame Color Images Super-Resolution Framework based on Deep Convolutional Neural Network. Zhe Li, Shu Li, Jianmin Wang and Hongyang Wang

IMAGE SUPER-RESOLUTION BASED ON DICTIONARY LEARNING AND ANCHORED NEIGHBORHOOD REGRESSION WITH MUTUAL INCOHERENCE

Deep Back-Projection Networks For Super-Resolution Supplementary Material

arxiv: v1 [cs.cv] 8 Feb 2018

Recovering Realistic Texture in Image Super-resolution by Deep Spatial Feature Transform. Xintao Wang Ke Yu Chao Dong Chen Change Loy

Example-Based Image Super-Resolution Techniques

A Novel Image Super-resolution Reconstruction Algorithm based on Modified Sparse Representation

Fast and Accurate Image Super-Resolution Using A Combined Loss

arxiv: v1 [cs.cv] 6 Nov 2015

Supplemental Material for End-to-End Learning of Video Super-Resolution with Motion Compensation

Deep Convolution Networks for Compression Artifacts Reduction

Learning to Match. Jun Xu, Zhengdong Lu, Tianqi Chen, Hang Li

arxiv: v2 [cs.cv] 11 Nov 2016

Bidirectional Recurrent Convolutional Networks for Multi-Frame Super-Resolution

arxiv: v1 [cs.cv] 3 Jan 2017

Single Image Super-Resolution Using a Deep Encoder-Decoder. Symmetrical Network with Iterative Back Projection

One Network to Solve Them All Solving Linear Inverse Problems using Deep Projection Models

Image Super-Resolution Using Dense Skip Connections

Efficient Module Based Single Image Super Resolution for Multiple Problems

Seven ways to improve example-based single image super resolution

A DEEP DICTIONARY MODEL FOR IMAGE SUPER-RESOLUTION. Jun-Jie Huang and Pier Luigi Dragotti

UDNet: Up-Down Network for Compact and Efficient Feature Representation in Image Super-Resolution

CNN for Low Level Image Processing. Huanjing Yue

Machine Learning 13. week

Single Image Super Resolution - When Model Adaptation Matters

Hallucinating Very Low-Resolution Unaligned and Noisy Face Images by Transformative Discriminative Autoencoders

Recovering Realistic Texture in Image Super-resolution by Deep Spatial Feature Transform Supplementary Material

DCGANs for image super-resolution, denoising and debluring

EnhanceNet: Single Image Super-Resolution Through Automated Texture Synthesis Supplementary

DEEP LEARNING OF COMPRESSED SENSING OPERATORS WITH STRUCTURAL SIMILARITY (SSIM) LOSS

RTSR: Enhancing Real-time H.264 Video Streaming using Deep Learning based Video Super Resolution Spring 2017 CS570 Project Presentation June 8, 2017

Learning based face hallucination techniques: A survey

Bilevel Sparse Coding

Single Image Super-Resolution via Iterative Collaborative Representation

Introduction. Prior work BYNET: IMAGE SUPER RESOLUTION WITH A BYPASS CONNECTION NETWORK. Bjo rn Stenger. Rakuten Institute of Technology

Kaggle Data Science Bowl 2017 Technical Report

SRHRF+: Self-Example Enhanced Single Image Super-Resolution Using Hierarchical Random Forests

SINGLE image super-resolution (SR) aims to reconstruct

arxiv: v2 [cs.cv] 19 Apr 2019

CG-DIQA: No-reference Document Image Quality Assessment Based on Character Gradient

arxiv: v1 [cs.cv] 23 Sep 2017

SINGLE image super-resolution (SR) aims to reconstruct

Enhancing DubaiSat-1 Satellite Imagery Using a Single Image Super-Resolution

Single Image Interpolation via Adaptive Non-Local Sparsity-Based Modeling

arxiv: v1 [cs.cv] 18 Dec 2018 Abstract

Exploiting Reflectional and Rotational Invariance in Single Image Superresolution

arxiv: v1 [cs.cv] 29 Sep 2016

Extended Dictionary Learning : Convolutional and Multiple Feature Spaces

Quality Enhancement of Compressed Video via CNNs

arxiv: v1 [cs.cv] 16 Nov 2015

SINGLE image restoration (IR) aims to generate a visually

Speed up a Machine-Learning-based Image Super-Resolution Algorithm on GPGPU

CS231A Course Project Final Report Sign Language Recognition with Unsupervised Feature Learning

Recognize Complex Events from Static Images by Fusing Deep Channels Supplementary Materials

IRGUN : Improved Residue based Gradual Up-Scaling Network for Single Image Super Resolution

Learning a Deep Convolutional Network for Image Super-Resolution

Super Resolution of the Partial Pixelated Images With Deep Convolutional Neural Network

Structured Face Hallucination

Disguised Face Identification (DFI) with Facial KeyPoints using Spatial Fusion Convolutional Network. Nathan Sun CIS601

MULTI-POSE FACE HALLUCINATION VIA NEIGHBOR EMBEDDING FOR FACIAL COMPONENTS. Yanghao Li, Jiaying Liu, Wenhan Yang, Zongming Guo

Anchored Neighborhood Regression for Fast Example-Based Super-Resolution

Research on Pruning Convolutional Neural Network, Autoencoder and Capsule Network

Dynamic Routing Between Capsules

Single-Image Super-Resolution Using Multihypothesis Prediction

Augmented Coupled Dictionary Learning for Image Super-Resolution

Lecture 7: Semantic Segmentation

SINGLE image super-resolution (SR) aims to infer a high. Single Image Super-Resolution via Cascaded Multi-Scale Cross Network

Deep Learning. Deep Learning. Practical Application Automatically Adding Sounds To Silent Movies

Finding Tiny Faces Supplementary Materials

GUN: Gradual Upsampling Network for single image super-resolution

Channel Locality Block: A Variant of Squeeze-and-Excitation

Deep Learning With Noise

CMU Lecture 18: Deep learning and Vision: Convolutional neural networks. Teacher: Gianni A. Di Caro

FAST: A Framework to Accelerate Super- Resolution Processing on Compressed Videos

DeepIM: Deep Iterative Matching for 6D Pose Estimation - Supplementary Material

Supplementary A. Overview. C. Time and Space Complexity. B. Shape Retrieval. D. Permutation Invariant SOM. B.1. Dataset

MULTIPLE-IMAGE SUPER RESOLUTION USING BOTH RECONSTRUCTION OPTIMIZATION AND DEEP NEURAL NETWORK. Jie Wu, Tao Yue, Qiu Shen, Xun Cao, Zhan Ma

Image Restoration Using DNN

Places Challenge 2017

Accelerated very deep denoising convolutional neural network for image super-resolution NTIRE2017 factsheet

Fast and Accurate Single Image Super-Resolution via Information Distillation Network

SAFE: Scale Aware Feature Encoder for Scene Text Recognition

arxiv: v1 [cond-mat.dis-nn] 30 Dec 2018

An Attention-Based Approach for Single Image Super Resolution

Convolution Neural Networks for Chinese Handwriting Recognition

Multi-Input Cardiac Image Super-Resolution using Convolutional Neural Networks

Image Upscaling and Fuzzy ARTMAP Neural Network

Balanced Two-Stage Residual Networks for Image Super-Resolution

Deep Networks for Image Super-Resolution with Sparse Prior

A Sparse Coding based Approach for the Resolution Enhancement and Restoration of Printed and Handwritten Textual Images

From processing to learning on graphs

SINGLE-IMAGE SUPER RESOLUTION FOR MULTISPECTRAL REMOTE SENSING DATA USING CONVOLUTIONAL NEURAL NETWORKS

CS231N Project Final Report - Fast Mixed Style Transfer

ECE 484 Digital Image Processing Lec 17 - Part II Review & Final Projects Topics

Deep Tracking: Biologically Inspired Tracking with Deep Convolutional Networks

CARN: Convolutional Anchored Regression Network for Fast and Accurate Single Image Super-Resolution

3194 IEEE TRANSACTIONS ON IMAGE PROCESSING, VOL. 25, NO. 7, JULY 2016

Transcription:

ICDAR 2015 COMPETITION ON TEXT IMAGE SUPER-RESOLUTION 1 Boosting Optical Character Recognition: A Super-Resolution Approach Chao Dong, Ximei Zhu, Yubin Deng, Chen Change Loy, Member, IEEE, and Yu Qiao Member, IEEE arxiv:1506.02211v1 [cs.cv] 7 Jun 2015 Abstract Text image super-resolution is a challenging yet open research problem in the computer vision community. In particular, low-resolution images hamper the performance of typical optical character recognition (OCR) systems. In this article, we summarize our entry to the ICDAR2015 Competition on Text Image Super-Resolution. Experiments are based on the provided ICDAR2015 TextSR dataset [3] and the released Tesseract-OCR 3.02 system [1]. We report that our winning entry of text image super-resolution framework has largely improved the OCR performance with low-resolution images used as input, reaching an OCR accuracy score of 77.19%, which is comparable with that of using the original high-resolution images (78.80%). Index Terms super resolution; optical character recognition. I. INTRODUCTION OPTICAL Character Recognition systems aim at converting textual images to machine-encoded text. By mimicking the human reading process, OCR systems enable a machine to understand the text information in images and recognize specific alphanumeric words, text phrases or sentences. Yet, due to the complexity of input contents and variations caused by various sources, accurately recognizing optical characters can be challenging for standard OCR systems. In particular, OCR systems trained on highresolution (HR) text images may fail on predicting the correctly text in elusive low-resolution (LR) text images. Specifically, LR text images lack of fine details, making it harder for the OCR systems to correctly retrieve the textual information from some commonly acquired OCR features. As such, performing super-resolution on input images is an intuitive pre-processing step towards the goal of accurate optical character recognition. To perform image super-resolution, unseen pixels need to be predicted under a well-constrained solution space with strong prior. By recovering the HR image from its LR correspondence, pixel values are non-linearly mapped during the process. Following the work [5], [6], we adopt a Super-Resolution Convolutional Neural Network (SRCNN) approach and train several CNNs targeted on text images. The text image super-resolution CNN framework C. Dong, Y. Deng and C. C. Loy are with the Department of Information Engineering, The Chinese University of Hong Kong, Hong Kong. E-mail: {dc012, danny.s.deng, ccloy}@ie.cuhk.edu.hk. X. Zhu and Y. Qiao are with the Shenzhen Institutes of Advanced Technology. E-mail: {xm.zhu, yu.qiao}@siat.ac.cn. demonstrates its effectiveness in recovering HR text images. Our main contributions are twofold: 1) We extend the SRCNN approach to the specific textimage domain, and conduct a comprehensive investigation on different network settings (e.g. filter size and number of layers). 2) We conduct model combination to further improve the performance. In the ICDAR 2015 Competition on Text Image Super-Resolution 1, we achieve the best results [3], which improve the OCR performance by 16.55% in accuracy compared with bicubic interpolation. II. METHODOLOGY A. Super-Resolution Convolutional Neural Network The Super-Resolution Convolutional Neural Network (SRCNN) [5], [6] is primarily designed for general singleimage super-resolution, and can be easily extended to specific super-resolution topics (e.g. face hallucination and text image super-resolution) or other low-level vision problems (e.g. denoising). The basic framework of SRCNN consists of three convolutional layers without pooling or fully-connected layers. This allows it to accept an LR image Y of any size (after interpolation and padding) and directly output the HR image F (Y). Each of the three layers is responsible for a specific task. The first layer performs feature extraction on the input image and represents each patch as a high dimensional feature vector. The second layer then non-linearly maps these feature vectors to another set of feature vectors which are conceptually the representation of the HR image patches. The last layer recombines these representations and reconstructs the final HR image. The process can be expressed as the following equations: F i (Y) = max (0, W i Y + B i ), i {1, 2}; (1) F (Y) = W 3 F 2 (Y) + B 3. (2) where W i and B i represent the filters and biases of the ith layer respectively, F i is the output feature maps and denotes the convolution operation. The W i contains n i filters of support n i 1 f i f i, where f i is the spatial support of a filter, n i is the number of filters, and n 0 is the number of channels in the input image. Please refer to [6] for more details. This framework is flexible at the choice of parameters. We can adopt four or more layers with a larger filter size 1 Competition website: http://liris.cnrs.fr/icdar-sr2015/

ICDAR 2015 COMPETITION ON TEXT IMAGE SUPER-RESOLUTION 2 to further improve the performance. For example, in [4], a feature enhancement layer is added after the feature extraction layer to denoise the extracted features. In the text image domain, we also attempt to use deeper networks to further improve performance. B. Model combination Model combination has been widely used in high-level vision problems (e.g. object detection [8]). In SRCNN, each output pixel can be regarded as a prediction of a pixel value, thus averaging predictions of different networks could inherently improve the accuracy. Further, as a training based method, the testing results may vary due to different check points and initial values. With model combination, this effect is neutralized, and the results will be more stable and reliable. We adopt a Greedy Search strategy to find the best model combination. Suppose we have successfully trained n models. First, we select the model that achieves the highest PSNR (or OCR score) on the validation set. Then we combine its results with that of the n models successively, and get n 2-model combinations. Here, we simply average their output pixel values. The combination with the highest PSNR (or OCR score) is saved as the best 2-model combination. We keep these two models unchanged and find another model from all n models for a second round combination. Similarly, we can identify the best m-model combination using these n models. Note that each model can be selected more than once, and the combination of more models is not necessarily better than that of less ones. At last, we choose the best model combination from the m selected model combinations. III. EXPERIMENTST Datasets. We use the ICDAR2015 TextSR dataset [3] provided by the ICDAR 2015 Competition on Text Image Super-Resolution. The dataset includes a training set and a test set. The training datset consists of 567 HR-LR grayscale image pairs together with the annotation files. HR and LR images are downsampled by factors of 2 and 4 from the original images that are extracted form French TV video flux. The height of LR images ranges from 9 to, and the annotation is realized using standard letters, numbers and 14 special characters. Figure 1 shows several examples of HR-LR image pairs in the training set. The test set contains only 141 LR images, without annotation and HR images. For more details of the dataset, please refer to [3]. To evaluate the performance, we select 30 image pairs from the training set as our validation set, thus the training set used in our experiments contains the rest 537 image pairs. Implementation Details. First, we upsample all LR images by a factor of 2 using Bicubic interpolation. Then the HR and LR images are of the same size in the training set. In the training phase, the ground truth images and LR input samples are prepared as 18 18 2 -pixel sub-images 2 The number 18 is the minimum height of HR images. Fig. 1. Low resolution High resolution height: 10 height: 20 height: 12 height: 24 height: 14 height: height: 16 height: 32 height: 18 height: 36 height: 20 height: 40 Examples of LR-HR image pairs in the training set. cropped from the training image pairs. The sub-images are extracted from the original images with a stride of 2 in vertical and 5 in horizontal. In total, the 537 images provide 156,941 training samples. Different from [5], [6], we cannot remove all borders during training, because the text images are much smaller than generic images. As a compromise, we fix the output size to be 14 and pad the input LR images according to the network scales. For example, if there are three layers, then we need to pad (f 1 + f 2 + f 3 3) 4 pixels with zeros. We use the Mean Squared Error (MSE) as the loss function, which is evaluated by the difference between the central 14 14 crop of the ground truth image and the network output. The filter weights of each layer are initialized by drawing randomly from a Gaussian distribution with zero mean and standard deviation 0.001 (and 0 for biases). The learning rate is 10 5 for the last layer and 10 4 for the rest layers. All results are based on the check point of 5,000 iterations (about 7.8 10 8 backpropagations). A. Super-resolution Results of Single Models To find the optimal network settings, we conduct four sets of experiments with different filter sizes, number of filters, number of layers and initial values. 1) Filter size: First, we adopt a general three-layer network structure and change the filter size. The basic settings proposed in [5] are f 1 = 9, f 2 = 1, f 3 = 5, n 1 = 64, n 2 = 32 and n 3 = 1, and the network can be denoted as 64(9)-32(1)-1(5). We follow [6] and enlarge the filter size of the second layer f 2 to be 3, 5 and 7. Then we have four networks 64(9)-32(1)- 1(5), 64(9)-32(3)-1(5), 64(9)-32(5)-1(5) and 64(9)-32(7)- 1(5). Convergence curves on the validation set are shown in Figure 2. Obviously, using a larger filter size could

ICDAR 2015 COMPETITION ON TEXT IMAGE SUPER-RESOLUTION 3 27 26.5 26 64(9)-32(5)-1(5) 64(9)-32(3)-1(5) 64(9)-32(1)-1(5) 5000 64(9)-32(7)-16(1)-1(5) 1(9)-32(7)-1(5) Fig. 2. Three-layer networks with different filter sizes. Fig. 4. Comparison between three-layer and the four-layer networks. 64(9)-64(7)-1(5) 1(9)-32(7)-1(5).5 27 26.5 64(15)-32(13)-16(11)-1(5) 64(13)-32(11)-16(9)-1(5) 64(11)-32(9)-16(9)-1(5) 64(11)-32(9)-16(7)-1(5) 64(9)-32(5)-16(5)-1(5) 64(9)-32(7)-16(5)-1(5) 64(9)-32(7)-16(3)-1(5) 64(9)-32(7)-16(1)-1(5) Fig. 3. Three-layer networks with different number of filters. Fig. 5. Four-layer networks with different filter sizes. significantly improve the performance. It is worth noticing that the difference between 64(9)-32(5)-1(5) and 64(9)- 32(7)-1(5) is marginal, thus further enlarging the filter size will be of little help. 2) Number of Filters: We then examine whether the performance benefits from more filters. For doing this, we fix the filter size to be f 1 = 9, f 2 = 7, f 3 = 5 and increase the number of filters to be n 1 = 1 and n 2 = 64, separately. The three networks are 64(9)- 32(7)-1(5), 1(9)-32(7)-1(5) and 64(9)-64(7)-1(5). From convergence curves shown in Figure 3, we observe that the performance has been pushed much higher. However, when we take a look at the model complexity, 1(9)- 32(7)-1(5) and 64(9)-64(7)-1(5) have 211,872 and 207,488 parameters, which are almost double of that for 64(9)- 32(7)-1(5) (106,336). Therefore, we wish to investigate other lower-cost alternatives that could still improve the performance. 3) Number of Layers: A reasonable attempt to improve the performance with low cost is to make the network deeper. Here, we adopt a four-layer network structure by embedding another feature enhancement layer as in [4]. Further more, we gradually enlarge the filter size to get better results. Overall, we train 8 networks 64(9)-32(7)- 16(1)-1(5), 64(9)-32(7)-16(3)-1(5), 64(9)-32(7)-16(5)-1(5), 64(9)-32(5)-16(5)-1(5), 64(11)-32(9)-16(7)-1(5), 64(11)- 32(9)-16(9)-1(5), 64(13)-32(11)-16(9)-1(5) and 64(15)- 32(13)-16(11)-1(5). Figure 4 reveals the convergence curves of the fourlayer network 64(9)-32(7)-16(1)-1(5) and the three-layer networks 1(9)-32(7)-1(5) and. Interestingly, the smallest four-layer network 64(9)-32(7)-16(1)- 1(5) achieves similar performance as the largest three-layer network 1(9)-32(7)-1(5). Note that the network 64(9)- 32(7)-16(1)-1(5) has only 106,448 parameters, which are roughly a half of that for 1(9)-32(7)-1(5) (211,872). This indicates that deeper networks could outperform shallow ones even with less parameters. When we enlarge the filter size of the first three layers, the performance improves and reaches a plateau at 64(11)- 32(9)-16(9)-1(5)(see Figure 5). Further enlarging the filter size could make the network overfit to the training data (see 64(13)-32(11)-16(9)-1(5) and 64(15)-32(13)-16(11)-1(5)). To avoid overfitting, a larger training set with more diverse images can be introduced, but we stick to the provided training set for competition. When we try much deeper networks (i.e. five layers), we find that the training is hard to converge even with smaller learning rates. This phenomenon is also observed in [6], where deeper networks are hard to train on generic images. We may try using other initialization strategies as in [4], but this is out of the scope of this paper. 4) Initial values: Then we explore whether the results are affected by different initial values. We train three networks with the same structure 64(9)-32(7)-16(5)-1(5) but different initial values, which are randomly drawn from the same Gaussian distribution for three times. From Figure 6, we could see that the behaviours of three convergence curves are slightly different. This suggests that the results are not unique with different initial values. On the other hand, we could also combine the results of networks with the same structure but different initial values. B. Model Combination To conduct model combination, we select 11 fourlayer networks with different filter sizes and initial values. They are 64(9)-32(7)-16(1)-1(5), 64(9)-32(7)-16(3)-

ICDAR 2015 COMPETITION ON TEXT IMAGE SUPER-RESOLUTION 4.5 64(9)-32(7)-16(5)-1(5) 3 64(9)-32(7)-16(5)-1(5) 2 64(9)-32(7)-16(5)-1(5) 1 High3resolution3/PSNR Bicubic3/18.883dB 64(9)-32(7)-16(5)-64(5)3/25.743dB 14-model3combination3/26.893dB (a) Fig. 6. Four-layer networks with different initial values. High-resolution-/PSNR Bicubic-/24.38-dB 64(9)-32(7)-16(5)-64(5)-/27.84-dB (b) 14-model-combination-/.74-dB High-resolution-/PSNR Bicubic-/23.65-dB Fig. 7. PSNR values of different model combinations. 64(9)-32(7)-16(5)-64(5)-/31.74-dB 14-model-combination-/33.-dB (c) Fig. 8. Super-resolution results of different methods. 1(5), 64(9)-32(7)-16(5)-1(5) (4 networks with different initial values), 64(9)-32(5)-16(5)-1(5), 64(11)-32(9)- 16(7)-1(5), 64(11)-32(9)-16(9)-1(5), 64(13)-32(11)-16(9)- 1(5) and 64(15)-32(13)-16(11)-1(5). Following the Greedy Search strategy, we successively obtain 14 best model combinations (from the best single model to the best 14- model combination) evaluated with PSNR. Note that we can also use the OCR score as evaluation criterion, then the results of model combinations could favour a high OCR score. The PSNR values 3 of the selected 14 best model combinations are shown in Figure 7, from which we have two main observations. First, using model combination could significantly improve the performance, e.g. 0.53dB improvement from the best single model to the best 2- model combination. Second, the PSNR values are stable when combining 5 or more models. Figure 8 shows the super-resolution results of the best single model 64(9)-32(7)-16(5)-1(5) and the best 14-model combination. As can be observed, the super-resolved images are very close to the ground truth HR images. C. Test Performance In the competition, we submitted two sets of results: one favouring the OCR accuracy score (SRCNN-1) and the other favouring the PSNR value (SRCNN-2). The performance is evaluated on the 141 test images with PSNR, RMSE, SSIM and OCR scores by the competition 3 Note that the PSNR values are not in accord with the convergence curves, since we remove 4-pixel borders during training, but keep them during testing. TABLE I RESULTS OF THE ICDAR2015 Competition on Text Image Super-Resolution. Method RMSE PSNR MSSIM OCR Bicubic 19.04 23.50 0.879 60.64 Lanczos3 16.97 24.65 0.902 64.36 Orange Labs [2] 11.27.25 0.953 74.12 Zeyde et al. [11] 13.05 27.21 0.941 69.72 A+ [9] 10.03.50 0.966 73.10 Synchromedia Lab [7] 62.67 12.66 0.623 65.93 ASRS [10] 12.86 26.98 0.950 71.25 SRCNN-1 [5], [6] 7.52 31.75 0.980 77.19 SRCNN-2 [5], [6] 7.24 31.99 0.981 76.10 committee. The results are published in [3] and listed in Table I. The compared methods are briefly described as follows. The bicubic and lanczos3 interpolation are the baseline methods, the Orange Labs [2] is the internal approach used by the organizer. The Zeyde et al. [11] and A+ [9] are the representative state-of-the-art super-resolution methods. The Synchromedia Lab [7] and ASRS [10] are methods of the other two competition teams. Obviously, our method achieves the best performance both on OCR score (SRCNN-1) and the reconstruction errors (SRCNN-2). The SRCNN-1 improves the OCR performance by 16.55% in accuracy compared with bicubic interpolation. It is worth noticing that the OCR score of using the original HR images is 78.80%, which is only 1.61% higher than SRCNN-1. This indicates that our method is extremely effective in improving the OCR accuracy.

ICDAR 2015 COMPETITION ON TEXT IMAGE SUPER-RESOLUTION 5 IV. CONCLUSIONS In this article, we summarize our attempt of adopting an SRCNN approach in the task of text image super-resolution for facilitating optical character recognition. Our method boosts the baseline OCR performance by a large margin (16.55%). As general image super-resolution has become an increasingly important problem in computer vision, we realize that more comprehensive studies on whether (or to what extend) super resolution further benefits other lowlevel vision tasks are needed. REFERENCES [1] Tesseract-OCR. http://code.google.com/p/tesseract-ocr/ 1 [2] Clment Peyrard, Franck Mamalet, C.G.: a comparison between multi-layer perceptrons and convolutional neural networks for text image super-resolution. In: VISAPP (2015) 4 [3] Clment Peyrard, Moez Baccouche, F.M.C.G.: ICDAR2015 competition on text image super-resolution. In: International Conference on Document Analysis and Recognition (2015) 1, 2, 4 [4] Dong, C., Deng, Y., Loy, C.C., Tang, X.: Compression artifacts reduction by a deep convolutional network. arxiv:1504.06993 (2015) 2, 3 [5] Dong, C., Loy, C.C., He, K., Tang, X.: Learning a deep convolutional network for image super-resolution. In: European Conference on Computer Vision, pp. 184 199 (2014) 1, 2, 4 [6] Dong, C., Loy, C.C., He, K., Tang, X.: Image super-resolution using deep convolutional networks. IEEE Transactions on Pattern Analysis and Machine Intelligence (2015) 1, 2, 3, 4 [7] Moghaddam, R.F., Cheriet, M.: A multi-scale framework for adaptive binarization of degraded document images. Pattern Recognition 43(6), 2186 2198 (2010) 4 [8] Ouyang, W., Luo, P., Zeng, X., Qiu, S., Tian, Y., Li, H., Yang, S., Wang, Z., Xiong, Y., Qian, C., et al.: Deepid-net: multi-stage and deformable deep convolutional neural networks for object detection. IEEE Conference on Computer Vision and Pattern Recognition (2015) 2 [9] Timofte, R., De Smet, V., Van Gool, L.: A+: Adjusted anchored neighborhood regression for fast super-resolution. In: IEEE Asian Conference on Computer Vision (2014) 4 [10] Walha, R., Drira, F., Lebourgeois, F., Garcia, C., Alimi, A.M.: Resolution enhancement of textual images via multiple coupled dictionaries and adaptive sparse representation selection. International Journal on Document Analysis and Recognition (IJDAR) pp. 1 21 (2015) 4 [11] Zeyde, R., Elad, M., Protter, M.: On single image scale-up using sparse-representations. In: Curves and Surfaces, pp. 711 730 (2012) 4