Total-Text: A Comprehensive Dataset for Scene Text Detection and Recognition

Size: px
Start display at page:

Download "Total-Text: A Comprehensive Dataset for Scene Text Detection and Recognition"

Transcription

1 Total-Text: A Comprehensive Dataset for Scene Text Detection and Recognition Chee Kheng Ch ng Chee Seng Chan Centre of Image & Signal Processing, Faculty of Computer Science & Info. Technology, University of Malaya, Malaysia chngcheekheng@siswa.um.edu.my, cs.chan@um.edu.my Abstract Text in curve orientation, despite being one of the common text orientations in real world environment, has close to zero existence in well received scene text datasets such as ICDAR 13 and MSRA-TD500. The main motivation of Total- Text is to fill this gap and facilitate a new research direction for the scene text community. On top of conventional horizontal and multi-oriented text, it features curved-oriented text. Total- Text is highly diversified in orientations, more than half of its images have a combination of more than two orientations. Recently, a new breed of solutions that casted text detection as a segmentation problem has demonstrated their effectiveness against multi-oriented text. In order to evaluate its robustness against curved text, we fine-tuned DeconvNet and benchmark it on Total-Text. Total-Text with its annotation is available at Keywords-Scene text dataset; Curve-oriented text; Segmentation-based text detection I. INTRODUCTION Scene text detection is one of the active computer vision topics due to the growing demands of applications such as multimedia retrieval, industrial automation, assisting device for vision-impaired people, etc. Given a natural scene image, the goal of text detection is to determine the existence of text, and return the location if it is present. Well known public datasets such as ICDAR 03, 11, 13 [1] (term as ICDARs from here onwards), and MSRA- TD500 [2] have played a significance role in initiating the momentum of scene text related research. One similarity in all the images of ICDARs is that all the texts are in horizontal orientation [12]. Such observation has inspired researchers to incorporate horizontal assumption [3] [7] in solving the scene text detection problem. In 2012, Yao et al. [2] introduced a new scene text dataset, namely MSRA- TD500, that challenged the community with texts arranged in multiple orientations. The popularity of it in turn defined the convention of multi-oriented texts. However, a closer look into the MSRA-TD500 dataset revealed that most, if not all the texts are still arranged in a straight line manner as to ICDARs (more details in Section III). Curved-oriented texts(term as curved text from here onwards), despite its commonness, are missing from the context of study. To the best of our knowledge, CUTE80 [8] is the only available scene text dataset to-date with curved text. However, its scale is too small with only 80 images and it has very minimal scene diversity. Figure 1: Annotation details of Total-Text, including transcription, polygon-shaped and rectangular bounding box vertices, orientations, care and do not care regions, and binary mask. Without the motivation of a proper dataset, effort in solving the curved text detection problem is rarely seen. This phenomenon brings us to our primary contribution of this paper: Total-Text, a scene text dataset collected with curved text in mind, filling the gap in scene text datasets in terms of text orientations. It has 1,555 scene images, 9,330 annotated words with 3 different text orientations including horizontal, multi-oriented, and curved text. Orientation assumption is commonly seen in text detection algorithms. We believe that the heuristic design to cater different types of text orientations hold back the generalization of text detecting system against texts in the real world with unconstrained orientations. Recent works [9] [11] have started to cast text detection as a semantic segmentation problem, and achieved state-of-the-art results in ICDAR 11, 13 and MSRA-TD500 datasets. They have reported successful detection of curved text as well. He et al.[3] system in particular has no orientation assumption and hueristic grouping mechanism. This bring us to the secondary contribution of this paper, we looked into this new solution and revealed how it handle multiple oriented

2 Figure 2: Curved text is commonly seen in real world scenery. text in natural scene. II. R ELATED W ORKS This section will discuss closely related works, specifically scene text datasets and text detection system. For completeness, readers are recommended to read [12]. A. Scene Text Datasets ICDARs [1] has three variants. ICDAR 03 started out with 509 camera taken scene text images. All the scene texts in the dataset appear in horizontal orientation. In ICDAR 11, the total number of images were reduced to 484 to eliminate duplication in the previous version. ICDAR 13 further trimmed down the 2011 version to 462 images in total. Improvement was done to increase its text categories and tasks. In ICDAR 13, there are 462 images of horizontal English texts. Recently, ICDAR launched a new challenge [13] named as the Incidental Scene Text (also known as the ICDAR 15), which is based on 1670 images captured with wearable devices. It is more challenging than previous datasets as it has included text with arbitrary orientation and most of them are out of focus. MSRA-TD500 [2] was introduced in 2012 to address the lack of arbitrary orientated text in scene text datasets. It has 300 training and 200 testing images; annotated with minimum area rectangle. COCO-text [14] was released in the early 2016, and is the largest scene text dataset to-date with 63,686 images and 173,589 labeled text regions. This large scale dataset contains all variety of text orientations: horizontal, arbitrary and curved. However, it used the axis oriented rectangle as groundtruth, which seems to be applicable only to horizontal and vertical texts. CUTE80 [8] is the only curved text dataset available in public to the best of our knowledge. It has only 80 images and limited sceneries. B. Scene Text Detection: Scene text detection has seen significant progress after the seminal work by Epshtein et al. [15] and Neumann and Matas [16]. In the former, Stroke Width Transform (SWT) was proposed to detect text. This method considered similar stroke widths to group text components and studied the component properties to classify them. In the latter, Maximally Stable Extremal Regions (MSER) was exploited to extract text components. They used geometrical properties of the components and a classifier to detect text. Both represent character better than all other feature extractors like color, edge, texture and etc. Upon picking up potential character candidates, these connected components based algorithms typically go through text line generation, candidates filtering and segmentation as pointed out by this survey [12]. As to many other computer vision tasks, the incorporation of Convolutional Neural Network (CNN) in localizing text is a very active research at the moment. Huang et al. [6] trained a character classifier to examine components generated by MSER, with the objective of improving the robustness of feature extraction process. Alongside this work, [17], [18] also trained a CNN to classify text components from nontext. This line of work demonstrated the high discriminative power of CNN as a feature extractor. However, interestingly, Zhang et al. [9] argued that leveraging on CNN as a character detector has restricted the CNN s potential due to the local nature of characters. Zhang et al. trained two Fully Convolutional Networks (FCN) [19]: 1) A Text-Block FCN that considers both local and global contextual info at the same time to identify text regions in an image, 2) Character-Centroid FCN to eliminate false text line candidates. However, text line generation, which plays a key role in grouping characters into a word, did not receive much benefit from the robust CNN. While most of the algorithms [9], [18] handcrafted the text line generation process, He et al. [10] trained a FCN to infer text line candidates. By cascading a text region and a text line using supervised FCN, Cascaded Convolution Text Network (CCTN) achieved generalization in terms of text orientations, and is one of the best performing system in both horizontal and abritrary oriented scene text datasets: ICDAR 2013 and MSRA-TD500. III. T OTAL -T EXT DATASET This section will discuss a) the motivation of collecting Total-Text; b) observation made on horizontal, multioriented, and curved text; c) orientation assumption aspect in the current state-of-the-art algorithms, and d) different aspects and statistics of Total-Text. A. Dataset Attributes Curved text is an overlooked problem. The effort of collecting this dataset is motivated by the missing of curved text in existing scene text datasets. Curved text can be easily

3 Figure 3: 1st row: Examples from ICDAR 2013, ICDAR2015 and MSRA-TD500; 2nd row: Slightly curved to extremely curved text examples from the Total-Text. (a) Yin et. al. [22] (red bounding box) and Huang et al. [6] (blue bounding box) (b) Shi et al. [7] Figure 4: These show that the current state-of-the-art solutions could not detect curved text effectively. found in real life scenes such as: business logos, signs, entrances etc as depicted in Fig. 7d, surprisingly such data has close to zero existence in the current datasets [1], [2], [13]. The most popular scene text dataset over the decade, ICDARs have only horizontal text [12]. Consequently, vast majority of algorithms assume text linearity to tackle the problem effectively. As a result of overwhelming attention, performances of text detections in ICDARs are saturated at quite a high point (0.9 in terms of f-score). Meanwhile, multi-oriented text also received a certain amount of attention from this community. MSRA-TD500 is a well known dataset that introduced this challenge to the field. Algorithms like [9], [20] were designed to cater multi-oriented text. To the best of our knowledge, scene text detection algorithms designed for curved orientation [8] in consideration is relatively unpopular. We believe that the lack of such dataset is the obvious reason why the community has overlooked it. Hence, we propose Total-Text with 4,265 curved text out of 9,330 total text instances, hoping to spur an interest in the community to address curved text. Curved text observation. Geometrically speaking, a straight line has no angle variation along the line, and thus can be described as a linear function, y = mx + c. A curved line is not a straight line. It is free of angle variation restriction throughout the line. Shifting to the scene text perspective, we observed that horizontal oriented text or word is a series of characters that can be connected by a straight line; their bottom alignment in particular for most cases. At the same time, multi-oriented text, in scene text convention, can also be connected by a straight line, given an offset with respect to a horizontal line. Meanwhile, characters a in curved word will not have unified angle offset, in which deemed to fit a polynomial line in text level (refer to Fig. 3 for image examples). In our dataset collection, we found out that curved text in natural images could vary from slightly curved to extremely curved. Also, it is not surprising to find that most of them are in the shape of a symmetric arc due to the symmetrical preferences in human vision [21]. Orientation assumption. We observed that orientation assumption is a must in a lot of algorithms [3] [6], [9], [20]. We took a closer look into the orientation assumption aspect of existing text detection algorithms and see how it fits into the observation we have made on the curved text. We mainly focused on systems in which the authors claimed to have multi-oriented text detection capability and reported their results on MSRA-TD500. Zhang et al. [9] first used the FCN to create a saliency map and generate text blocks. Consequently, the system draw a straight line from the middle point of the generated text blocks, aiming to hit as many character components as possible; the straight line with the angle offset that hit the most text blocks will be considered as text line for the subsequent step. We believe that such mechanism would not work in our dataset, as a straight line would miss the polynomial nature of curved text. [20] focused on the text candidate construction part to detect multi-oriented text. Their algorithm will first clusters character pairs with consistent orientation or perspective

4 Figure 5: Comparison between conventional rectangular bounding box (red colour) and the proposed polygon-shaped bounding region (green colour) in Total-Text. Polygon-shaped appeared to be the better candidate for groundtruth. (a) Various text orientations (from left to right). Top (One orientation): HC; VC; Cir and W. Middle (Two orientations): Cir+H; MO+HC; W+H. Bottom (Three orientations): H+MO+VC; H+MO+HC; H+MO+Cir (b) Various text fonts and image backgrounds Figure 6: Total-Text dataset is challenging due to its highly diversified orientation compositions and scenery. Legends: H=horizontal, MO=multi-oriented, HC=horizontal curve, VC=vertical curve, Cir=circular and W=Wavy. view into the same group. As we can see in Fig. 3 (second row, second and third image specifically), characters in a single curved word could have multiple variations in terms of orientation. In fact, both of these algorithms, along with [7], have reported their failure on the same curved text images in MSRA-TD500 as illustrated in Fig. 4b. It is worth to note that MSRA-TD500 has only 2 curved text instances in the entire dataset. Last but not least, we ran [22] and [6] on several images of Total-Text, results can be seen in Fig. 4. Focused scene text as a start. Two of the latest scene text datasets, COCO-text and ICDAR 2015 emerged to challenge current algorithms with incidental images. For example, scene images in the ICDAR 2015 [13] were captured without prior effort in positioning the text in it. Although it was not mentioned explicitly, one can deduce the emergence of these datasets are possibly due to: i) Performances of various algorithms on previous ICDARs dataset have saturated at a rather high point, hence a new dataset with higher level of complexity is deem required, ii) Well focused scene text are not likely to be captured by devices in real world scenarios. While the work done in curved text detection is considerably rare, we believe that it is at its infant stage. Inspired by the improvement in scene text detection and recognition brought by focused scene text datasets, notably ICDARs, and MSRA-TD500, we believe that focused scene text instead of incidental scene text is more appropriate to kick start related research work. Tighter groundtruth is better. ICDAR 2015 employed quadrilaterals in its annotation to cater perspective distorted text [13]. However, COCO-text used rectangular bounding boxes [14] like ICDAR 2013, which we think is a poor choice considering the text orientation variations in it. Fig. 5 illustrates the downside of such bounding box annotation. Text regions cover much of the background which is not an ideal groundtruth for both evaluation and training. In Total-Text, we annotated the text region with polygon shapes that fits tightly, and the groundtruth is provided in polygon vertices format. Evaluation Protocol. Like ICDARs datasets [12], TotalText uses DetEval [23]. We did a modication to the minimum intersection area calculation stage to handle our polygonshaped groundtruth. The evaluation protocol will be made

5 (a) Text instances per image (b) Text orientations per image (c) Curve variations (d) Occurrence of curved text Figure 7: Statistics of Total-Text dataset Figure 8: Examples of pixel-level annotation (cropped) in Total-Text. available as well. Annotation Details. Groundtruth in the Total-Text is annotated in word level granularity. Adopted from the COCO-text, word level texts are uninterrupted sequence of characters separated by a space. As mentioned, Total- Text uses polygon shapes to bind groundtruth words tightly. Apart from that, we also included rectangular bounding box annotation considering most of the current algorithms generate rectangule bounding box outputs. However, it is not an accurate representation as a big chunk of background area is included due to the nature of curved text. Therefore, we do not encourage the usage of rectangular bounding box in our dataset. Total-Text considers only English characters in natural images; other languages, digital watermarks and unreadable texts are labelled as do not care in the groundtruth. Do not care area picks up by algorithms should be filtered out before evaluating its performance. Groundtruth for word recognition is also provided along with its spatial coordinates. In addition, orientation of every instances were annotated for modularity convenience. For example, if one prefer to evaluate curved text detection ability only, one could leverage this annotation to filter out intances with other orientations. Last but not least, Total- Text also comes with binary mask groundtruth to cater the recent requirements [9] [11]. Fig. 1 illustrates all the aforementioned annotation details apart from the pixel-level annotation, which is illustrated in Fig. 8. Considering the scale of this dataset is manageable, authors of this paper annotated the entire dataset manually and cross checked with another 3 laboratory members. B. Dataset Statistics This subsection will discuss the statistics of Total-Text. All of the comparisons are made against ICDAR 2013 and MSRA-TD500, as they are the most common benchmark for horizontal and multi-oriented focused scene text respectively. Total-Text is split into two groups, training and testing set with 1255 and 300 images, respectively. Strength in numbers. Fig. 7 shows a series of statistics information of the Total-Text. It has a total of 9330 annotated texts, 6 instances per image in average. More than half of the images in Total-Text have 2 different orientations and above, yielding 1.8 orientations per image on average. Both numbers ranked first against its competitors [12], showing the complexity of Total-Text. Apart from these solid numbers, the dataset was also collected with quality in mind, including scene complexity such as text-like and low contrast background, different font types and sizes, etc, image examples in Fig. 6b. Orientation diversity. Approximate by half of the text instances are curved, and the other half is split almost equally between horizontal and multi-oriented. Curve text has its own variation too. Based on our observation, we classified them as horizontal curved, vertical curved, circular, and wavy (refer to 6a for image example). Their composition in the dataset can be seen in 7c. Although all the images were collected with curved text in mind, other orientations still occupy half of the total instances. A closer look into the dataset shows that curved text usually appears with either horizontal or multi-oriented texts. The mixture of orientations in an image, challenges text detection algorithms to achieve robustness and generalization in terms of text

6 Figure 9: Visualization of the activations in deconvolution network. The activation maps from top left to bottom right correspond to the output maps from lower to higher layers in the deconvolution network. We select the most representative activation in each layer for effective visualization. (a) Input image; (b) the last deconvolutional layer; (c) the unpooling layer; (d) the last deconvolutional layer; (e) the unpooling layer; (f) the last deconvolutional layer; (g) the unpooling layer; (h) the last deconvolutional layer; (i) the unpooling layer and (j) the last deconvolutional layer. orientations. Scene diversity. In comparison to CUTE80 (the only publicly available curved text dataset), which majority of the images are football jerseys, Total-Text is much more diversified. Fig. 7d shows where curved text usually appears. Business related places like restaurant (i.e., Nandos, Starbucks), company branding logos, and merchant stores take up of 61.2% of the curved text instances. Tourist spots such as park (i.e., Beverly Hills in America), museums and landmarks (i.e., Harajuku in Japan) occupy 21.1%. Fig. 2 illustrates these examples. IV. S EMANTIC SEGMENTATION FOR TEXT DETECTION Inspired by the success of FCN in the semantic segmentation problem, [9] [11] casted text detection as a segmentation problem, and achieved state-of-the-art results. While most of the conventional algorithms failed in detecting curved text, their algorithms have shown successful results in limited number of examples due to the lack of available benchmark. The fact that [10] achieved good results without any heuristic grouping rules where most of the other algorithms need, intrigued us to look into this new breed of solution. We fine-tuned DeconvNet [24] and evaluated it on Total-Text, following section will discuss our findings. A. DeconvNet We select DeconvNet [24] as our investigation tool due to two reasons: 1) it achieved state-of-the-art results in semantic segmentation on Pascal VOC dataset and 2) Multiple deconvolutional layers in the DeconvNet allow us to observe the deviation finely. The scope of this paper is not proposing a new solution to solve the curved text problem, hence we merely convert and fine-tune the network to localize texts. For complete understanding, readers are encouraged to read [24]. Conversion. The last convolution layer of the original DeconvNet has 21 layers for 20 classes in the PASCAL VOC benchmark [25] and one background class. In this paper, we reduced it to two layers, representing text and non-text. Then, we fine-tuned the pre-trained model provided by Noh et al. [24] with one step training process instead of two as discussed in the original paper. Apart from these and the training data, all other training implementations were consistent with the original paper. Training Data. Considering the depth of DeconvNet (i.e., 29 convolutional layers and 252M parameters), we pretrained it using the largest scene text dataset, COCO-text [14]. Images in the COCO-text were categorized into legible and illegible text, where we trained our network only on the legible text as it closely resemble our dataset. Similar to [9], [10], we first generated the binary mask with 1 indicating text region and 0 for background. Approximately 15k of training data were cropped into 256x256 patches to cater the receptive field of the DeconvNet. Patches with less than 10% text regions were eliminated to prevent overwhelming amount of non-text data. Roughly 200k and 80k patches of training and validation data were generated, respectively. We augmented the data in parallel to the training with horizontal flipping and random cropping (into 224x224). B. Experiments Inference. The inference process was kept to be as simple as possible. We resized input images to , then forward propagated them through the DeconvNet. To generate final detection result, the saliency map was binarized using a threshold of 0.5, followed by connected component

7 Table I: Evaluation of DeconvNet on Total-Text. Dataset Total-Text Recall 0.33 Precision 0.40 F-score 0.36 Figure 10: Successful examples of DeconvNet. Figure 12: Examples of DeconvNet with lower confidence at both end of the curved text. Figure 11: Failure examples of DeconvNet. analysis to group 1s (text) pixels and bound them tightly with polygons. Results. The outcomes were evaluated using our evaluation protocol and listed in Table I. As we went through each of the output saliency maps, we found two consistent roots that cause such unsatisfactory results: 1) The network is not robust enough for challenging backgrounds such as texts attached on repeated patterns such as bricks, gate, wall, etc.; 2) Multiple word candidates were grouped as one. Fig. 11 illustrates some failure examples. We suspect the robustness of the network was affected by its training data. Such loosely bounded training data with background regions labelled as text could have impacted the training process to a certain extend. Meanwhile, producing word line level output is commonly seen in text detection algorithms, we lack of a segmentation process to separate them into words level. Deeper look into the network. As mentioned before, our primary intention were to investigate the performance of DeconvNet on text with all sorts of orientations. With no orientation assumption or any heuristic grouping mechanism in the design, we managed to find candidates across texts with all orientations as illustrated in Fig. 10. We were curious on how and what exactly happened across the deconvolution network. So, we cropped a specific patch of an original image that consists of curved text, forward propagated through the network, and observed the feature maps in several layers of the deconvolution network. As we can see in Fig. 9, at the lower layers, we can notice which part of the feature map is highly activated. As the layers proceed, finer details emerged, enriching the region of interest to an extend that we can recognized the characters in it. Spatial resolution of feature maps is crucial. Text detection systems like [9], [10] adopted FCN and skip connections in their Convolutional Network. Such design element perserves spatial resolution of feature maps, and in turn provides better contextual information for their pixelwise prediction task. Similarly, DeconvNet uses a combination of both unpooling layers and learn-able upsampling convolution filters to infer bigger feature maps layer after layer. As we can see in Fig. 10, such saliency map is high in resolution, depicts the actual shape or orientation of the detected text region. Minimal post-processing steps are required to retrieve text candidates from it. Text line supervision is an interesting step forward. Fig. 12 illustrates several examples where the network is not confident about the shape of the curved text regions. We believe that it could be improved with text line supervision leveraged in [10]. This can be noticed in [10], where the work showed their results without the FTN, its performance droped from 0.84 to 0.5 in terms of F-score. V. C ONCLUSION This paper introduces a comprehensive scene text dataset, Total-Text, featuring the missing element in current scene text datasets - curved text. We believe that curved text should be included as part of the multi-oriented text detection problem. While it is under research at the moment, we hope the availability of Total-Text could change the scene. We fine-tuned and analyzed how DeconvNet responds to curved text. Spatial resolution of feature maps and contextual

8 information appeared to be crucial in segmentation based methods. Such methods are capable of predicting text regions in all sorts of orientations without hard-coded rules. Inspired by this observation, we plan to explore this area further with the aim of designing a scene text detect that is effective against multi-oriented text. ACKNOWLEDGMENT This work is partly supported by Postgraduate Research Grant (PPP) - PG A, from University of Malaya. The Titan-X GPU used by this research was donated by NVIDIA Corporation. We would also like to express our gratitude towards Jia Huei Tan, Yang Loong Chang and Yuen Peng Loh for Total-Text image collection and annotation. REFERENCES [1] D. Karatzas, F. Shafait, S. Uchida, M. Iwamura, L. G. i Bigorda, S. R. Mestre, J. Mas, D. F. Mota, J. A. Almazan, and L. P. de las Heras, Icdar 2013 robust reading competition, in ICDAR, [2] C. Yao, X. Bai, W. Liu, Y. Ma, and Z. Tu, Detecting texts of arbitrary orientations in natural images, in CVPR, [3] Z. Zhang, W. Shen, C. Yao, and X. Bai, Symmetry-based text line detection in natural scenes, in CVPR, [4] W. Huang, Z. Lin, J. Yang, and J. Wang, Text localization in natural images using stroke feature transform and text covariance descriptors, in ICCV, [5] L. Neumann and J. Matas, Scene text localization and recognition with oriented stroke detection, in ICCV, [6] W. Huang, Y. Qiao, and X. Tang, Robust scene text detection with convolution neural network induced mser trees, in ECCV, [7] B. Shi, X. Bai, and S. Belongie, Detecting oriented text in natural images by linking segments, in CVPR, [8] A. Risnumawan, P. Shivakumara, C. S. Chan, and C. L. Tan, A robust arbitrary text detection system for natural scene images, Expert Systems with Applications, vol. 41, no. 18, pp , [9] Z. Zhang, C. Zhang, W. Shen, C. Yao, W. Liu, and X. Bai, Multi-oriented text detection with fully convolutional networks, in CVPR, [13] D. Karatzas, L. Gomez-Bigorda, A. Nicolaou, S. Ghosh, A. Bagdanov, M. Iwamura, J. Matas, L. Neumann, V. R. Chandrasekhar, S. Lu et al., Icdar 2015 competition on robust reading, in ICDAR, [14] V. Andreas, M. Tomas, N. Lukas, M. Jiri, and B. Serge, Coco-text: Dataset and benchmark for text detection and recognition in natural images, arxiv preprint arxiv: , [15] B. Epshtein, E. Ofek, and Y. Wexler, Detecting text in natural scenes with stroke width transform, in CVPR, [16] J. Matas, O. Chum, M. Urban, and T. Pajdla, Robust widebaseline stereo from maximally stable extremal regions, Image and Vision Computing, vol. 22, no. 10, pp , [17] T. Wang, D. J. Wu, A. Coates, and A. Y. Ng, End-to-end text recognition with convolutional neural networks, in ICPR, [18] M. Jaderberg, A. Vedaldi, and A. Zisserman, Deep features for text spotting, in ECCV, [19] J. Long, E. Shelhamer, and T. Darrell, Fully convolutional networks for semantic segmentation, in CVPR, [20] X.-C. Yin, W.-Y. Pei, J. Zhang, and H.-W. Hao, Multiorientation scene text detection with adaptive clustering, T- PAMI, vol. 37, no. 9, pp , [21] R. B. Adams, The Science of Social Vision: The Science of Social Vision. Oxford University Press, 2011, vol. 7. [22] X.-C. Yin, X. Yin, K. Huang, and H.-W. Hao, Robust text detection in natural scene images, T-PAMI, vol. 36, no. 5, pp , [23] C. Wolf and J.-M. Jolion, Object count/area graphs for the evaluation of object detection and segmentation algorithms, IJDAR, vol. 8, no. 4, pp , [24] H. Noh, S. Hong, and B. Han, Learning deconvolution network for semantic segmentation, in ICCV, [25] M. Everingham, L. Van Gool, C. K. Williams, J. Winn, and A. Zisserman, The pascal visual object classes (voc) challenge, International Journal of Computer Vision, vol. 88, no. 2, pp , [10] T. He, W. Huang, Y. Qiao, and J. Yao, Accurate text localization in natural image with cascaded convolutional text network, arxiv preprint arxiv: , [11] C. Yao, X. Bai, N. Sang, X. Zhou, S. Zhou, and Z. Cao, Scene text detection via holistic, multi-channel prediction, arxiv preprint arxiv: , [12] Q. Ye and D. Doermann, Text detection and recognition in images and video : a survey, T-PAMI, vol. 37, no. 7, pp. 1 20, 2014.

Arbitrary-Oriented Scene Text Detection via Rotation Proposals

Arbitrary-Oriented Scene Text Detection via Rotation Proposals 1 Arbitrary-Oriented Scene Text Detection via Rotation Proposals Jianqi Ma, Weiyuan Shao, Hao Ye, Li Wang, Hong Wang, Yingbin Zheng, Xiangyang Xue arxiv:1703.01086v1 [cs.cv] 3 Mar 2017 Abstract This paper

More information

Multi-Oriented Text Detection with Fully Convolutional Networks

Multi-Oriented Text Detection with Fully Convolutional Networks Multi-Oriented Text Detection with Fully Convolutional Networks Zheng Zhang 1 Chengquan Zhang 1 Wei Shen 2 Cong Yao 1 Wenyu Liu 1 Xiang Bai 1 1 School of Electronic Information and Communications, Huazhong

More information

WeText: Scene Text Detection under Weak Supervision

WeText: Scene Text Detection under Weak Supervision WeText: Scene Text Detection under Weak Supervision Shangxuan Tian 1, Shijian Lu 2, and Chongshou Li 3 1 Visual Computing Department, Institute for Infocomm Research 2 School of Computer Science and Engineering,

More information

Segmentation Framework for Multi-Oriented Text Detection and Recognition

Segmentation Framework for Multi-Oriented Text Detection and Recognition Segmentation Framework for Multi-Oriented Text Detection and Recognition Shashi Kant, Sini Shibu Department of Computer Science and Engineering, NRI-IIST, Bhopal Abstract - Here in this paper a new and

More information

arxiv: v1 [cs.cv] 12 Sep 2016

arxiv: v1 [cs.cv] 12 Sep 2016 arxiv:1609.03605v1 [cs.cv] 12 Sep 2016 Detecting Text in Natural Image with Connectionist Text Proposal Network Zhi Tian 1, Weilin Huang 1,2, Tong He 1, Pan He 1, and Yu Qiao 1,3 1 Shenzhen Key Lab of

More information

arxiv: v1 [cs.cv] 4 Jan 2018

arxiv: v1 [cs.cv] 4 Jan 2018 PixelLink: Detecting Scene Text via Instance Segmentation Dan Deng 1,3, Haifeng Liu 1, Xuelong Li 4, Deng Cai 1,2 1 State Key Lab of CAD&CG, College of Computer Science, Zhejiang University 2 Alibaba-Zhejiang

More information

arxiv: v1 [cs.cv] 4 Mar 2017

arxiv: v1 [cs.cv] 4 Mar 2017 Deep Matching Prior Network: Toward Tighter Multi-oriented Text Detection Yuliang Liu, Lianwen Jin+ College of Electronic Information Engineering South China University of Technology +lianwen.jin@gmail.com

More information

REGION AVERAGE POOLING FOR CONTEXT-AWARE OBJECT DETECTION

REGION AVERAGE POOLING FOR CONTEXT-AWARE OBJECT DETECTION REGION AVERAGE POOLING FOR CONTEXT-AWARE OBJECT DETECTION Kingsley Kuan 1, Gaurav Manek 1, Jie Lin 1, Yuan Fang 1, Vijay Chandrasekhar 1,2 Institute for Infocomm Research, A*STAR, Singapore 1 Nanyang Technological

More information

arxiv: v1 [cs.cv] 1 Sep 2017

arxiv: v1 [cs.cv] 1 Sep 2017 Single Shot Text Detector with Regional Attention Pan He1, Weilin Huang2, 3, Tong He3, Qile Zhu1, Yu Qiao3, and Xiaolin Li1 arxiv:1709.00138v1 [cs.cv] 1 Sep 2017 1 National Science Foundation Center for

More information

Lecture 7: Semantic Segmentation

Lecture 7: Semantic Segmentation Semantic Segmentation CSED703R: Deep Learning for Visual Recognition (207F) Segmenting images based on its semantic notion Lecture 7: Semantic Segmentation Bohyung Han Computer Vision Lab. bhhanpostech.ac.kr

More information

arxiv: v1 [cs.cv] 23 Apr 2016

arxiv: v1 [cs.cv] 23 Apr 2016 Text Flow: A Unified Text Detection System in Natural Scene Images Shangxuan Tian1, Yifeng Pan2, Chang Huang2, Shijian Lu3, Kai Yu2, and Chew Lim Tan1 arxiv:1604.06877v1 [cs.cv] 23 Apr 2016 1 School of

More information

Photo OCR ( )

Photo OCR ( ) Photo OCR (2017-2018) Xiang Bai Huazhong University of Science and Technology Outline VALSE2018, DaLian Xiang Bai 2 Deep Direct Regression for Multi-Oriented Scene Text Detection [He et al., ICCV, 2017.]

More information

An Efficient Method to Extract Digital Text From Scanned Image Text

An Efficient Method to Extract Digital Text From Scanned Image Text An Efficient Method to Extract Digital Text From Scanned Image Text Jenick Johnson ECE Dept., Christ the King Engineering College Coimbatore-641104, Tamil Nadu, India Suresh Babu. V ECE Dept., Christ the

More information

arxiv: v2 [cs.cv] 10 Jul 2017

arxiv: v2 [cs.cv] 10 Jul 2017 EAST: An Efficient and Accurate Scene Text Detector Xinyu Zhou, Cong Yao, He Wen, Yuzhi Wang, Shuchang Zhou, Weiran He, and Jiajun Liang Megvii Technology Inc., Beijing, China {zxy, yaocong, wenhe, wangyuzhi,

More information

Accurate Scene Text Detection through Border Semantics Awareness and Bootstrapping

Accurate Scene Text Detection through Border Semantics Awareness and Bootstrapping Accurate Scene Text Detection through Border Semantics Awareness and Bootstrapping Chuhui Xue [0000 0002 3562 3094], Shijian Lu [0000 0002 6766 2506], and Fangneng Zhan [0000 0003 1502 6847] School of

More information

Feature Fusion for Scene Text Detection

Feature Fusion for Scene Text Detection 2018 13th IAPR International Workshop on Document Analysis Systems Feature Fusion for Scene Text Detection Zhen Zhu, Minghui Liao, Baoguang Shi, Xiang Bai School of Electronic Information and Communications

More information

Channel Locality Block: A Variant of Squeeze-and-Excitation

Channel Locality Block: A Variant of Squeeze-and-Excitation Channel Locality Block: A Variant of Squeeze-and-Excitation 1 st Huayu Li Northern Arizona University Flagstaff, United State Northern Arizona University hl459@nau.edu arxiv:1901.01493v1 [cs.lg] 6 Jan

More information

Towards Visual Words to Words

Towards Visual Words to Words Towards Visual Words to Words Text Detection with a General Bag of Words Representation Rakesh Mehta Dept. of Signal Processing, Tampere Univ. of Technology in Tampere Ondřej Chum, Jiří Matas Centre for

More information

TextSnake: A Flexible Representation for Detecting Text of Arbitrary Shapes

TextSnake: A Flexible Representation for Detecting Text of Arbitrary Shapes TextSnake: A Flexible Representation for Detecting Text of Arbitrary Shapes Shangbang Long 1,2[0000 0002 4089 5369], Jiaqiang Ruan 1,2, Wenjie Zhang 1,2,,Xin He 2, Wenhao Wu 2, Cong Yao 2[0000 0001 6564

More information

TRANSPARENT OBJECT DETECTION USING REGIONS WITH CONVOLUTIONAL NEURAL NETWORK

TRANSPARENT OBJECT DETECTION USING REGIONS WITH CONVOLUTIONAL NEURAL NETWORK TRANSPARENT OBJECT DETECTION USING REGIONS WITH CONVOLUTIONAL NEURAL NETWORK 1 Po-Jen Lai ( 賴柏任 ), 2 Chiou-Shann Fuh ( 傅楸善 ) 1 Dept. of Electrical Engineering, National Taiwan University, Taiwan 2 Dept.

More information

Available online at ScienceDirect. Procedia Computer Science 96 (2016 )

Available online at   ScienceDirect. Procedia Computer Science 96 (2016 ) Available online at www.sciencedirect.com ScienceDirect Procedia Computer Science 96 (2016 ) 1409 1417 20th International Conference on Knowledge Based and Intelligent Information and Engineering Systems,

More information

Content-Based Image Recovery

Content-Based Image Recovery Content-Based Image Recovery Hong-Yu Zhou and Jianxin Wu National Key Laboratory for Novel Software Technology Nanjing University, China zhouhy@lamda.nju.edu.cn wujx2001@nju.edu.cn Abstract. We propose

More information

SSD: Single Shot MultiBox Detector. Author: Wei Liu et al. Presenter: Siyu Jiang

SSD: Single Shot MultiBox Detector. Author: Wei Liu et al. Presenter: Siyu Jiang SSD: Single Shot MultiBox Detector Author: Wei Liu et al. Presenter: Siyu Jiang Outline 1. Motivations 2. Contributions 3. Methodology 4. Experiments 5. Conclusions 6. Extensions Motivation Motivation

More information

arxiv: v1 [cs.cv] 2 Jan 2019

arxiv: v1 [cs.cv] 2 Jan 2019 Detecting Text in the Wild with Deep Character Embedding Network Jiaming Liu, Chengquan Zhang, Yipeng Sun, Junyu Han, and Errui Ding Baidu Inc, Beijing, China. {liujiaming03,zhangchengquan,yipengsun,hanjunyu,dingerrui}@baidu.com

More information

Efficient Segmentation-Aided Text Detection For Intelligent Robots

Efficient Segmentation-Aided Text Detection For Intelligent Robots Efficient Segmentation-Aided Text Detection For Intelligent Robots Junting Zhang, Yuewei Na, Siyang Li, C.-C. Jay Kuo University of Southern California Outline Problem Definition and Motivation Related

More information

Scene Text Detection Using Machine Learning Classifiers

Scene Text Detection Using Machine Learning Classifiers 601 Scene Text Detection Using Machine Learning Classifiers Nafla C.N. 1, Sneha K. 2, Divya K.P. 3 1 (Department of CSE, RCET, Akkikkvu, Thrissur) 2 (Department of CSE, RCET, Akkikkvu, Thrissur) 3 (Department

More information

Supplementary Material: Unconstrained Salient Object Detection via Proposal Subset Optimization

Supplementary Material: Unconstrained Salient Object Detection via Proposal Subset Optimization Supplementary Material: Unconstrained Salient Object via Proposal Subset Optimization 1. Proof of the Submodularity According to Eqns. 10-12 in our paper, the objective function of the proposed optimization

More information

arxiv: v2 [cs.cv] 27 Feb 2018

arxiv: v2 [cs.cv] 27 Feb 2018 Multi-Oriented Scene Text Detection via Corner Localization and Region Segmentation arxiv:1802.08948v2 [cs.cv] 27 Feb 2018 Pengyuan Lyu 1, Cong Yao 2, Wenhao Wu 2, Shuicheng Yan 3, Xiang Bai 1 1 Huazhong

More information

Detecting Printed and Handwritten Partial Copies of Line Drawings Embedded in Complex Backgrounds

Detecting Printed and Handwritten Partial Copies of Line Drawings Embedded in Complex Backgrounds 9 1th International Conference on Document Analysis and Recognition Detecting Printed and Handwritten Partial Copies of Line Drawings Embedded in Complex Backgrounds Weihan Sun, Koichi Kise Graduate School

More information

Structured Prediction using Convolutional Neural Networks

Structured Prediction using Convolutional Neural Networks Overview Structured Prediction using Convolutional Neural Networks Bohyung Han bhhan@postech.ac.kr Computer Vision Lab. Convolutional Neural Networks (CNNs) Structured predictions for low level computer

More information

YOLO9000: Better, Faster, Stronger

YOLO9000: Better, Faster, Stronger YOLO9000: Better, Faster, Stronger Date: January 24, 2018 Prepared by Haris Khan (University of Toronto) Haris Khan CSC2548: Machine Learning in Computer Vision 1 Overview 1. Motivation for one-shot object

More information

Latest development in image feature representation and extraction

Latest development in image feature representation and extraction International Journal of Advanced Research and Development ISSN: 2455-4030, Impact Factor: RJIF 5.24 www.advancedjournal.com Volume 2; Issue 1; January 2017; Page No. 05-09 Latest development in image

More information

arxiv: v1 [cs.cv] 31 Mar 2016

arxiv: v1 [cs.cv] 31 Mar 2016 Object Boundary Guided Semantic Segmentation Qin Huang, Chunyang Xia, Wenchao Zheng, Yuhang Song, Hao Xu and C.-C. Jay Kuo arxiv:1603.09742v1 [cs.cv] 31 Mar 2016 University of Southern California Abstract.

More information

Deep Direct Regression for Multi-Oriented Scene Text Detection

Deep Direct Regression for Multi-Oriented Scene Text Detection Deep Direct Regression for Multi-Oriented Scene Text Detection Wenhao He 1,2 Xu-Yao Zhang 1 Fei Yin 1 Cheng-Lin Liu 1,2 1 National Laboratory of Pattern Recognition (NLPR) Institute of Automation, Chinese

More information

ABSTRACT 1. INTRODUCTION 2. RELATED WORK

ABSTRACT 1. INTRODUCTION 2. RELATED WORK Improving text recognition by distinguishing scene and overlay text Bernhard Quehl, Haojin Yang, Harald Sack Hasso Plattner Institute, Potsdam, Germany Email: {bernhard.quehl, haojin.yang, harald.sack}@hpi.de

More information

Aggregating Local Context for Accurate Scene Text Detection

Aggregating Local Context for Accurate Scene Text Detection Aggregating Local Context for Accurate Scene Text Detection Dafang He 1, Xiao Yang 2, Wenyi Huang, 1, Zihan Zhou 1, Daniel Kifer 2, and C.Lee Giles 1 1 Information Science and Technology, Penn State University

More information

A process for text recognition of generic identification documents over cloud computing

A process for text recognition of generic identification documents over cloud computing 142 Int'l Conf. IP, Comp. Vision, and Pattern Recognition IPCV'16 A process for text recognition of generic identification documents over cloud computing Rodolfo Valiente, Marcelo T. Sadaike, José C. Gutiérrez,

More information

Supplementary Material: Pixelwise Instance Segmentation with a Dynamically Instantiated Network

Supplementary Material: Pixelwise Instance Segmentation with a Dynamically Instantiated Network Supplementary Material: Pixelwise Instance Segmentation with a Dynamically Instantiated Network Anurag Arnab and Philip H.S. Torr University of Oxford {anurag.arnab, philip.torr}@eng.ox.ac.uk 1. Introduction

More information

An Approach to Detect Text and Caption in Video

An Approach to Detect Text and Caption in Video An Approach to Detect Text and Caption in Video Miss Megha Khokhra 1 M.E Student Electronics and Communication Department, Kalol Institute of Technology, Gujarat, India ABSTRACT The video image spitted

More information

Real-time Object Detection CS 229 Course Project

Real-time Object Detection CS 229 Course Project Real-time Object Detection CS 229 Course Project Zibo Gong 1, Tianchang He 1, and Ziyi Yang 1 1 Department of Electrical Engineering, Stanford University December 17, 2016 Abstract Objection detection

More information

An Anchor-Free Region Proposal Network for Faster R-CNN based Text Detection Approaches

An Anchor-Free Region Proposal Network for Faster R-CNN based Text Detection Approaches An Anchor-Free Region Proposal Network for Faster R-CNN based Text Detection Approaches Zhuoyao Zhong 1,2,*, Lei Sun 2, Qiang Huo 2 1 School of EIE., South China University of Technology, Guangzhou, China

More information

WordSup: Exploiting Word Annotations for Character based Text Detection

WordSup: Exploiting Word Annotations for Character based Text Detection WordSup: Exploiting Word Annotations for Character based Text Detection Han Hu 1 Chengquan Zhang 2 Yuxuan Luo 2 Yuzhuo Wang 2 Junyu Han 2 Errui Ding 2 Microsoft Research Asia 1 IDL, Baidu Research 2 hanhu@microsoft.com

More information

Scene Text Recognition for Augmented Reality. Sagar G V Adviser: Prof. Bharadwaj Amrutur Indian Institute Of Science

Scene Text Recognition for Augmented Reality. Sagar G V Adviser: Prof. Bharadwaj Amrutur Indian Institute Of Science Scene Text Recognition for Augmented Reality Sagar G V Adviser: Prof. Bharadwaj Amrutur Indian Institute Of Science Outline Research area and motivation Finding text in natural scenes Prior art Improving

More information

Geometry-aware Traffic Flow Analysis by Detection and Tracking

Geometry-aware Traffic Flow Analysis by Detection and Tracking Geometry-aware Traffic Flow Analysis by Detection and Tracking 1,2 Honghui Shi, 1 Zhonghao Wang, 1,2 Yang Zhang, 1,3 Xinchao Wang, 1 Thomas Huang 1 IFP Group, Beckman Institute at UIUC, 2 IBM Research,

More information

CAP 6412 Advanced Computer Vision

CAP 6412 Advanced Computer Vision CAP 6412 Advanced Computer Vision http://www.cs.ucf.edu/~bgong/cap6412.html Boqing Gong April 21st, 2016 Today Administrivia Free parameters in an approach, model, or algorithm? Egocentric videos by Aisha

More information

Deep Tracking: Biologically Inspired Tracking with Deep Convolutional Networks

Deep Tracking: Biologically Inspired Tracking with Deep Convolutional Networks Deep Tracking: Biologically Inspired Tracking with Deep Convolutional Networks Si Chen The George Washington University sichen@gwmail.gwu.edu Meera Hahn Emory University mhahn7@emory.edu Mentor: Afshin

More information

A FRAMEWORK OF EXTRACTING MULTI-SCALE FEATURES USING MULTIPLE CONVOLUTIONAL NEURAL NETWORKS. Kuan-Chuan Peng and Tsuhan Chen

A FRAMEWORK OF EXTRACTING MULTI-SCALE FEATURES USING MULTIPLE CONVOLUTIONAL NEURAL NETWORKS. Kuan-Chuan Peng and Tsuhan Chen A FRAMEWORK OF EXTRACTING MULTI-SCALE FEATURES USING MULTIPLE CONVOLUTIONAL NEURAL NETWORKS Kuan-Chuan Peng and Tsuhan Chen School of Electrical and Computer Engineering, Cornell University, Ithaca, NY

More information

Text Detection and Extraction from Natural Scene: A Survey Tajinder Kaur 1 Post-Graduation, Department CE, Punjabi University, Patiala, Punjab India

Text Detection and Extraction from Natural Scene: A Survey Tajinder Kaur 1 Post-Graduation, Department CE, Punjabi University, Patiala, Punjab India Volume 3, Issue 3, March 2015 International Journal of Advance Research in Computer Science and Management Studies Research Article / Survey Paper / Case Study Available online at: www.ijarcsms.com ISSN:

More information

MULTI-SCALE OBJECT DETECTION WITH FEATURE FUSION AND REGION OBJECTNESS NETWORK. Wenjie Guan, YueXian Zou*, Xiaoqun Zhou

MULTI-SCALE OBJECT DETECTION WITH FEATURE FUSION AND REGION OBJECTNESS NETWORK. Wenjie Guan, YueXian Zou*, Xiaoqun Zhou MULTI-SCALE OBJECT DETECTION WITH FEATURE FUSION AND REGION OBJECTNESS NETWORK Wenjie Guan, YueXian Zou*, Xiaoqun Zhou ADSPLAB/Intelligent Lab, School of ECE, Peking University, Shenzhen,518055, China

More information

Automatically Algorithm for Physician s Handwritten Segmentation on Prescription

Automatically Algorithm for Physician s Handwritten Segmentation on Prescription Automatically Algorithm for Physician s Handwritten Segmentation on Prescription Narumol Chumuang 1 and Mahasak Ketcham 2 Department of Information Technology, Faculty of Information Technology, King Mongkut's

More information

Multi-script Text Extraction from Natural Scenes

Multi-script Text Extraction from Natural Scenes Multi-script Text Extraction from Natural Scenes Lluís Gómez and Dimosthenis Karatzas Computer Vision Center Universitat Autònoma de Barcelona Email: {lgomez,dimos}@cvc.uab.es Abstract Scene text extraction

More information

CS231A Course Project Final Report Sign Language Recognition with Unsupervised Feature Learning

CS231A Course Project Final Report Sign Language Recognition with Unsupervised Feature Learning CS231A Course Project Final Report Sign Language Recognition with Unsupervised Feature Learning Justin Chen Stanford University justinkchen@stanford.edu Abstract This paper focuses on experimenting with

More information

LEVERAGING SURROUNDING CONTEXT FOR SCENE TEXT DETECTION

LEVERAGING SURROUNDING CONTEXT FOR SCENE TEXT DETECTION LEVERAGING SURROUNDING CONTEXT FOR SCENE TEXT DETECTION Yao Li 1, Chunhua Shen 1, Wenjing Jia 2, Anton van den Hengel 1 1 The University of Adelaide, Australia 2 University of Technology, Sydney, Australia

More information

A Hierarchical Visual Saliency Model for Character Detection in Natural Scenes

A Hierarchical Visual Saliency Model for Character Detection in Natural Scenes A Hierarchical Visual Saliency Model for Character Detection in Natural Scenes Renwu Gao 1, Faisal Shafait 2, Seiichi Uchida 3, and Yaokai Feng 3 1 Information Sciene and Electrical Engineering, Kyushu

More information

Detecting and Recognizing Text in Natural Images using Convolutional Networks

Detecting and Recognizing Text in Natural Images using Convolutional Networks Detecting and Recognizing Text in Natural Images using Convolutional Networks Aditya Srinivas Timmaraju, Vikesh Khanna Stanford University Stanford, CA - 94305 adityast@stanford.edu, vikesh@stanford.edu

More information

Reading Text in the Wild from Compressed Images

Reading Text in the Wild from Compressed Images Reading Text in the Wild from Compressed Images Leonardo Galteri leonardo.galteri@unifi.it Marco Bertini marco.bertini@unifi.it Dimosthenis Karatzas CVC, Barcelona dimos@cvc.uab.es Dena Bazazian CVC, Barcelona

More information

Scene text recognition: no country for old men?

Scene text recognition: no country for old men? Scene text recognition: no country for old men? Lluís Gómez and Dimosthenis Karatzas Computer Vision Center Universitat Autònoma de Barcelona Email: {lgomez,dimos}@cvc.uab.es Abstract. It is a generally

More information

A Laplacian Based Novel Approach to Efficient Text Localization in Grayscale Images

A Laplacian Based Novel Approach to Efficient Text Localization in Grayscale Images A Laplacian Based Novel Approach to Efficient Text Localization in Grayscale Images Karthik Ram K.V & Mahantesh K Department of Electronics and Communication Engineering, SJB Institute of Technology, Bangalore,

More information

TextField: Learning A Deep Direction Field for Irregular Scene Text Detection

TextField: Learning A Deep Direction Field for Irregular Scene Text Detection 1 TextField: Learning A Deep Direction Field for Irregular Scene Text Detection Yongchao Xu, Yukang Wang, Wei Zhou, Yongpan Wang, Zhibo Yang, Xiang Bai, Senior Member, IEEE arxiv:1812.01393v1 [cs.cv] 4

More information

Robust Face Recognition Based on Convolutional Neural Network

Robust Face Recognition Based on Convolutional Neural Network 2017 2nd International Conference on Manufacturing Science and Information Engineering (ICMSIE 2017) ISBN: 978-1-60595-516-2 Robust Face Recognition Based on Convolutional Neural Network Ying Xu, Hui Ma,

More information

DeepLab: Semantic Image Segmentation with Deep Convolutional Nets, Atrous Convolution and Fully Connected CRFs

DeepLab: Semantic Image Segmentation with Deep Convolutional Nets, Atrous Convolution and Fully Connected CRFs DeepLab: Semantic Image Segmentation with Deep Convolutional Nets, Atrous Convolution and Fully Connected CRFs Zhipeng Yan, Moyuan Huang, Hao Jiang 5/1/2017 1 Outline Background semantic segmentation Objective,

More information

Fast scene understanding and prediction for autonomous platforms. Bert De Brabandere, KU Leuven, October 2017

Fast scene understanding and prediction for autonomous platforms. Bert De Brabandere, KU Leuven, October 2017 Fast scene understanding and prediction for autonomous platforms Bert De Brabandere, KU Leuven, October 2017 Who am I? MSc in Electrical Engineering at KU Leuven, Belgium Last year PhD student with Luc

More information

arxiv: v1 [cs.cv] 4 Dec 2017

arxiv: v1 [cs.cv] 4 Dec 2017 Enhanced Characterness for Text Detection in the Wild Aarushi Agrawal 2, Prerana Mukherjee 1, Siddharth Srivastava 1, and Brejesh Lall 1 arxiv:1712.04927v1 [cs.cv] 4 Dec 2017 1 Department of Electrical

More information

arxiv: v1 [cs.cv] 16 Nov 2015

arxiv: v1 [cs.cv] 16 Nov 2015 Coarse-to-fine Face Alignment with Multi-Scale Local Patch Regression Zhiao Huang hza@megvii.com Erjin Zhou zej@megvii.com Zhimin Cao czm@megvii.com arxiv:1511.04901v1 [cs.cv] 16 Nov 2015 Abstract Facial

More information

Deep learning for dense per-pixel prediction. Chunhua Shen The University of Adelaide, Australia

Deep learning for dense per-pixel prediction. Chunhua Shen The University of Adelaide, Australia Deep learning for dense per-pixel prediction Chunhua Shen The University of Adelaide, Australia Image understanding Classification error Convolution Neural Networks 0.3 0.2 0.1 Image Classification [Krizhevsky

More information

Rotation-sensitive Regression for Oriented Scene Text Detection

Rotation-sensitive Regression for Oriented Scene Text Detection Rotation-sensitive Regression for Oriented Scene Text Detection Minghui Liao 1, Zhen Zhu 1, Baoguang Shi 1, Gui-song Xia 2, Xiang Bai 1 1 Huazhong University of Science and Technology 2 Wuhan University

More information

JOINT DETECTION AND SEGMENTATION WITH DEEP HIERARCHICAL NETWORKS. Zhao Chen Machine Learning Intern, NVIDIA

JOINT DETECTION AND SEGMENTATION WITH DEEP HIERARCHICAL NETWORKS. Zhao Chen Machine Learning Intern, NVIDIA JOINT DETECTION AND SEGMENTATION WITH DEEP HIERARCHICAL NETWORKS Zhao Chen Machine Learning Intern, NVIDIA ABOUT ME 5th year PhD student in physics @ Stanford by day, deep learning computer vision scientist

More information

TEXTS in scenes contain high level semantic information

TEXTS in scenes contain high level semantic information 1 ESIR: End-to-end Scene Text Recognition via Iterative Rectification Fangneng Zhan and Shijian Lu arxiv:1812.05824v1 [cs.cv] 14 Dec 2018 Abstract Automated recognition of various texts in scenes has been

More information

arxiv: v1 [cs.cv] 22 Aug 2017

arxiv: v1 [cs.cv] 22 Aug 2017 WordSup: Exploiting Word Annotations for Character based Text Detection Han Hu 1 Chengquan Zhang 2 Yuxuan Luo 2 Yuzhuo Wang 2 Junyu Han 2 Errui Ding 2 Microsoft Research Asia 1 IDL, Baidu Research 2 hanhu@microsoft.com

More information

Reading Text in the Wild from Compressed Images

Reading Text in the Wild from Compressed Images Reading Text in the Wild from Compressed Images Leonardo Galteri University of Florence leonardo.galteri@unifi.it Marco Bertini University of Florence marco.bertini@unifi.it Dimosthenis Karatzas CVC, Barcelona

More information

Fully Convolutional Networks for Semantic Segmentation

Fully Convolutional Networks for Semantic Segmentation Fully Convolutional Networks for Semantic Segmentation Jonathan Long* Evan Shelhamer* Trevor Darrell UC Berkeley Chaim Ginzburg for Deep Learning seminar 1 Semantic Segmentation Define a pixel-wise labeling

More information

Text Extraction from Natural Scene Images and Conversion to Audio in Smart Phone Applications

Text Extraction from Natural Scene Images and Conversion to Audio in Smart Phone Applications Text Extraction from Natural Scene Images and Conversion to Audio in Smart Phone Applications M. Prabaharan 1, K. Radha 2 M.E Student, Department of Computer Science and Engineering, Muthayammal Engineering

More information

Detecting Oriented Text in Natural Images by Linking Segments

Detecting Oriented Text in Natural Images by Linking Segments Detecting Oriented Text in Natural Images by Linking Segments Baoguang Shi 1 Xiang Bai 1 Serge Belongie 2 1 School of EIC, Huazhong University of Science and Technology 2 Department of Computer Science,

More information

12/12 A Chinese Words Detection Method in Camera Based Images Qingmin Chen, Yi Zhou, Kai Chen, Li Song, Xiaokang Yang Institute of Image Communication

12/12 A Chinese Words Detection Method in Camera Based Images Qingmin Chen, Yi Zhou, Kai Chen, Li Song, Xiaokang Yang Institute of Image Communication A Chinese Words Detection Method in Camera Based Images Qingmin Chen, Yi Zhou, Kai Chen, Li Song, Xiaokang Yang Institute of Image Communication and Information Processing, Shanghai Key Laboratory Shanghai

More information

Traffic Signs Recognition using HP and HOG Descriptors Combined to MLP and SVM Classifiers

Traffic Signs Recognition using HP and HOG Descriptors Combined to MLP and SVM Classifiers Traffic Signs Recognition using HP and HOG Descriptors Combined to MLP and SVM Classifiers A. Salhi, B. Minaoui, M. Fakir, H. Chakib, H. Grimech Faculty of science and Technology Sultan Moulay Slimane

More information

Correcting User Guided Image Segmentation

Correcting User Guided Image Segmentation Correcting User Guided Image Segmentation Garrett Bernstein (gsb29) Karen Ho (ksh33) Advanced Machine Learning: CS 6780 Abstract We tackle the problem of segmenting an image into planes given user input.

More information

DETECTING TEXTUAL INFORMATION IN IMAGES FROM ONION DOMAINS USING TEXT SPOTTING

DETECTING TEXTUAL INFORMATION IN IMAGES FROM ONION DOMAINS USING TEXT SPOTTING Actas de las XXXIX Jornadas de Automática, Badajoz, 5-7 de Septiembre de 2018 DETECTING TEXTUAL INFORMATION IN IMAGES FROM ONION DOMAINS USING TEXT SPOTTING Pablo Blanco Dept. IESA. Universidad de León,

More information

Andrei Polzounov (Universitat Politecnica de Catalunya, Barcelona, Spain), Artsiom Ablavatski (A*STAR Institute for Infocomm Research, Singapore),

Andrei Polzounov (Universitat Politecnica de Catalunya, Barcelona, Spain), Artsiom Ablavatski (A*STAR Institute for Infocomm Research, Singapore), WordFences: Text Localization and Recognition ICIP 2017 Andrei Polzounov (Universitat Politecnica de Catalunya, Barcelona, Spain), Artsiom Ablavatski (A*STAR Institute for Infocomm Research, Singapore),

More information

Verisimilar Image Synthesis for Accurate Detection and Recognition of Texts in Scenes

Verisimilar Image Synthesis for Accurate Detection and Recognition of Texts in Scenes Verisimilar Image Synthesis for Accurate Detection and Recognition of Texts in Scenes Fangneng Zhan 1[0000 0003 1502 6847], Shijian Lu 2[0000 0002 6766 2506], and Chuhui Xue 3[0000 0002 3562 3094] School

More information

THE SPEED-LIMIT SIGN DETECTION AND RECOGNITION SYSTEM

THE SPEED-LIMIT SIGN DETECTION AND RECOGNITION SYSTEM THE SPEED-LIMIT SIGN DETECTION AND RECOGNITION SYSTEM Kuo-Hsin Tu ( 塗國星 ), Chiou-Shann Fuh ( 傅楸善 ) Dept. of Computer Science and Information Engineering, National Taiwan University, Taiwan E-mail: p04922004@csie.ntu.edu.tw,

More information

Scene Text Detection via Holistic, Multi-Channel Prediction

Scene Text Detection via Holistic, Multi-Channel Prediction Scene Text Detection via Holistic, Multi-Channel Prediction Cong Yao1,2, Xiang Bai1, Nong Sang1, Xinyu Zhou2, Shuchang Zhou2, Zhimin Cao2 1 arxiv:1606.09002v1 [cs.cv] 29 Jun 2016 Huazhong University of

More information

Speeding up the Detection of Line Drawings Using a Hash Table

Speeding up the Detection of Line Drawings Using a Hash Table Speeding up the Detection of Line Drawings Using a Hash Table Weihan Sun, Koichi Kise 2 Graduate School of Engineering, Osaka Prefecture University, Japan sunweihan@m.cs.osakafu-u.ac.jp, 2 kise@cs.osakafu-u.ac.jp

More information

arxiv: v3 [cs.cv] 2 Jun 2017

arxiv: v3 [cs.cv] 2 Jun 2017 Incorporating the Knowledge of Dermatologists to Convolutional Neural Networks for the Diagnosis of Skin Lesions arxiv:1703.01976v3 [cs.cv] 2 Jun 2017 Iván González-Díaz Department of Signal Theory and

More information

Mask R-CNN. presented by Jiageng Zhang, Jingyao Zhan, Yunhan Ma

Mask R-CNN. presented by Jiageng Zhang, Jingyao Zhan, Yunhan Ma Mask R-CNN presented by Jiageng Zhang, Jingyao Zhan, Yunhan Ma Mask R-CNN Background Related Work Architecture Experiment Mask R-CNN Background Related Work Architecture Experiment Background From left

More information

Layout Segmentation of Scanned Newspaper Documents

Layout Segmentation of Scanned Newspaper Documents , pp-05-10 Layout Segmentation of Scanned Newspaper Documents A.Bandyopadhyay, A. Ganguly and U.Pal CVPR Unit, Indian Statistical Institute 203 B T Road, Kolkata, India. Abstract: Layout segmentation algorithms

More information

Yiqi Yan. May 10, 2017

Yiqi Yan. May 10, 2017 Yiqi Yan May 10, 2017 P a r t I F u n d a m e n t a l B a c k g r o u n d s Convolution Single Filter Multiple Filters 3 Convolution: case study, 2 filters 4 Convolution: receptive field receptive field

More information

TextBoxes++: A Single-Shot Oriented Scene Text Detector

TextBoxes++: A Single-Shot Oriented Scene Text Detector 1 TextBoxes++: A Single-Shot Oriented Scene Text Detector Minghui Liao, Baoguang Shi, Xiang Bai, Senior Member, IEEE arxiv:1801.02765v3 [cs.cv] 27 Apr 2018 Abstract Scene text detection is an important

More information

ICDAR 2013 Robust Reading Competition

ICDAR 2013 Robust Reading Competition ICDAR 2013 Robust Reading Competition D. Karatzas, F. Shafait, S. Uchida, M. Iwamura, L. Gomez, S. Robles, J. Mas, D. Fernandez, J. Almazán, L.P. de las Heras http://dag.cvc.uab.es/icdar2013competition/

More information

THE automated understanding of textual information in

THE automated understanding of textual information in MANUSCRIPT PREPRINT, JULY 2014 1 A Fast Hierarchical Method for Multi-script and Arbitrary Oriented Scene Text Extraction Lluis Gomez, and Dimosthenis Karatzas, Member, IEEE arxiv:1407.7504v1 [cs.cv] 28

More information

A Study of Vehicle Detector Generalization on U.S. Highway

A Study of Vehicle Detector Generalization on U.S. Highway 26 IEEE 9th International Conference on Intelligent Transportation Systems (ITSC) Windsor Oceanico Hotel, Rio de Janeiro, Brazil, November -4, 26 A Study of Vehicle Generalization on U.S. Highway Rakesh

More information

Improving Latent Fingerprint Matching Performance by Orientation Field Estimation using Localized Dictionaries

Improving Latent Fingerprint Matching Performance by Orientation Field Estimation using Localized Dictionaries Available Online at www.ijcsmc.com International Journal of Computer Science and Mobile Computing A Monthly Journal of Computer Science and Information Technology IJCSMC, Vol. 3, Issue. 11, November 2014,

More information

Translation Symmetry Detection: A Repetitive Pattern Analysis Approach

Translation Symmetry Detection: A Repetitive Pattern Analysis Approach 2013 IEEE Conference on Computer Vision and Pattern Recognition Workshops Translation Symmetry Detection: A Repetitive Pattern Analysis Approach Yunliang Cai and George Baciu GAMA Lab, Department of Computing

More information

Supplementary Material for Zoom and Learn: Generalizing Deep Stereo Matching to Novel Domains

Supplementary Material for Zoom and Learn: Generalizing Deep Stereo Matching to Novel Domains Supplementary Material for Zoom and Learn: Generalizing Deep Stereo Matching to Novel Domains Jiahao Pang 1 Wenxiu Sun 1 Chengxi Yang 1 Jimmy Ren 1 Ruichao Xiao 1 Jin Zeng 1 Liang Lin 1,2 1 SenseTime Research

More information

[Supplementary Material] Improving Occlusion and Hard Negative Handling for Single-Stage Pedestrian Detectors

[Supplementary Material] Improving Occlusion and Hard Negative Handling for Single-Stage Pedestrian Detectors [Supplementary Material] Improving Occlusion and Hard Negative Handling for Single-Stage Pedestrian Detectors Junhyug Noh Soochan Lee Beomsu Kim Gunhee Kim Department of Computer Science and Engineering

More information

Efficient indexing for Query By String text retrieval

Efficient indexing for Query By String text retrieval Efficient indexing for Query By String text retrieval Suman K. Ghosh Lluís, Gómez, Dimosthenis Karatzas and Ernest Valveny Computer Vision Center, Dept. Ciències de la Computació Universitat Autònoma de

More information

Extend the shallow part of Single Shot MultiBox Detector via Convolutional Neural Network

Extend the shallow part of Single Shot MultiBox Detector via Convolutional Neural Network Extend the shallow part of Single Shot MultiBox Detector via Convolutional Neural Network Liwen Zheng, Canmiao Fu, Yong Zhao * School of Electronic and Computer Engineering, Shenzhen Graduate School of

More information

OTCYMIST: Otsu-Canny Minimal Spanning Tree for Born-Digital Images

OTCYMIST: Otsu-Canny Minimal Spanning Tree for Born-Digital Images OTCYMIST: Otsu-Canny Minimal Spanning Tree for Born-Digital Images Deepak Kumar and A G Ramakrishnan Medical Intelligence and Language Engineering Laboratory Department of Electrical Engineering, Indian

More information

Deconvolutions in Convolutional Neural Networks

Deconvolutions in Convolutional Neural Networks Overview Deconvolutions in Convolutional Neural Networks Bohyung Han bhhan@postech.ac.kr Computer Vision Lab. Convolutional Neural Networks (CNNs) Deconvolutions in CNNs Applications Network visualization

More information

arxiv: v1 [cs.cv] 6 Dec 2017

arxiv: v1 [cs.cv] 6 Dec 2017 Detecting Curve Text in the Wild: New Dataset and New Solution Yuliang Liu, Lianwen Jin, Shuaitao Zhang, Sheng Zhang College of Electronic Information Engineering South China University of Technology liu.yuliang@mail.scut.edu.cn;

More information

CS231N Project Final Report - Fast Mixed Style Transfer

CS231N Project Final Report - Fast Mixed Style Transfer CS231N Project Final Report - Fast Mixed Style Transfer Xueyuan Mei Stanford University Computer Science xmei9@stanford.edu Fabian Chan Stanford University Computer Science fabianc@stanford.edu Tianchang

More information