Total-Text: A Comprehensive Dataset for Scene Text Detection and Recognition
|
|
- Imogen Lambert
- 5 years ago
- Views:
Transcription
1 Total-Text: A Comprehensive Dataset for Scene Text Detection and Recognition Chee Kheng Ch ng Chee Seng Chan Centre of Image & Signal Processing, Faculty of Computer Science & Info. Technology, University of Malaya, Malaysia chngcheekheng@siswa.um.edu.my, cs.chan@um.edu.my Abstract Text in curve orientation, despite being one of the common text orientations in real world environment, has close to zero existence in well received scene text datasets such as ICDAR 13 and MSRA-TD500. The main motivation of Total- Text is to fill this gap and facilitate a new research direction for the scene text community. On top of conventional horizontal and multi-oriented text, it features curved-oriented text. Total- Text is highly diversified in orientations, more than half of its images have a combination of more than two orientations. Recently, a new breed of solutions that casted text detection as a segmentation problem has demonstrated their effectiveness against multi-oriented text. In order to evaluate its robustness against curved text, we fine-tuned DeconvNet and benchmark it on Total-Text. Total-Text with its annotation is available at Keywords-Scene text dataset; Curve-oriented text; Segmentation-based text detection I. INTRODUCTION Scene text detection is one of the active computer vision topics due to the growing demands of applications such as multimedia retrieval, industrial automation, assisting device for vision-impaired people, etc. Given a natural scene image, the goal of text detection is to determine the existence of text, and return the location if it is present. Well known public datasets such as ICDAR 03, 11, 13 [1] (term as ICDARs from here onwards), and MSRA- TD500 [2] have played a significance role in initiating the momentum of scene text related research. One similarity in all the images of ICDARs is that all the texts are in horizontal orientation [12]. Such observation has inspired researchers to incorporate horizontal assumption [3] [7] in solving the scene text detection problem. In 2012, Yao et al. [2] introduced a new scene text dataset, namely MSRA- TD500, that challenged the community with texts arranged in multiple orientations. The popularity of it in turn defined the convention of multi-oriented texts. However, a closer look into the MSRA-TD500 dataset revealed that most, if not all the texts are still arranged in a straight line manner as to ICDARs (more details in Section III). Curved-oriented texts(term as curved text from here onwards), despite its commonness, are missing from the context of study. To the best of our knowledge, CUTE80 [8] is the only available scene text dataset to-date with curved text. However, its scale is too small with only 80 images and it has very minimal scene diversity. Figure 1: Annotation details of Total-Text, including transcription, polygon-shaped and rectangular bounding box vertices, orientations, care and do not care regions, and binary mask. Without the motivation of a proper dataset, effort in solving the curved text detection problem is rarely seen. This phenomenon brings us to our primary contribution of this paper: Total-Text, a scene text dataset collected with curved text in mind, filling the gap in scene text datasets in terms of text orientations. It has 1,555 scene images, 9,330 annotated words with 3 different text orientations including horizontal, multi-oriented, and curved text. Orientation assumption is commonly seen in text detection algorithms. We believe that the heuristic design to cater different types of text orientations hold back the generalization of text detecting system against texts in the real world with unconstrained orientations. Recent works [9] [11] have started to cast text detection as a semantic segmentation problem, and achieved state-of-the-art results in ICDAR 11, 13 and MSRA-TD500 datasets. They have reported successful detection of curved text as well. He et al.[3] system in particular has no orientation assumption and hueristic grouping mechanism. This bring us to the secondary contribution of this paper, we looked into this new solution and revealed how it handle multiple oriented
2 Figure 2: Curved text is commonly seen in real world scenery. text in natural scene. II. R ELATED W ORKS This section will discuss closely related works, specifically scene text datasets and text detection system. For completeness, readers are recommended to read [12]. A. Scene Text Datasets ICDARs [1] has three variants. ICDAR 03 started out with 509 camera taken scene text images. All the scene texts in the dataset appear in horizontal orientation. In ICDAR 11, the total number of images were reduced to 484 to eliminate duplication in the previous version. ICDAR 13 further trimmed down the 2011 version to 462 images in total. Improvement was done to increase its text categories and tasks. In ICDAR 13, there are 462 images of horizontal English texts. Recently, ICDAR launched a new challenge [13] named as the Incidental Scene Text (also known as the ICDAR 15), which is based on 1670 images captured with wearable devices. It is more challenging than previous datasets as it has included text with arbitrary orientation and most of them are out of focus. MSRA-TD500 [2] was introduced in 2012 to address the lack of arbitrary orientated text in scene text datasets. It has 300 training and 200 testing images; annotated with minimum area rectangle. COCO-text [14] was released in the early 2016, and is the largest scene text dataset to-date with 63,686 images and 173,589 labeled text regions. This large scale dataset contains all variety of text orientations: horizontal, arbitrary and curved. However, it used the axis oriented rectangle as groundtruth, which seems to be applicable only to horizontal and vertical texts. CUTE80 [8] is the only curved text dataset available in public to the best of our knowledge. It has only 80 images and limited sceneries. B. Scene Text Detection: Scene text detection has seen significant progress after the seminal work by Epshtein et al. [15] and Neumann and Matas [16]. In the former, Stroke Width Transform (SWT) was proposed to detect text. This method considered similar stroke widths to group text components and studied the component properties to classify them. In the latter, Maximally Stable Extremal Regions (MSER) was exploited to extract text components. They used geometrical properties of the components and a classifier to detect text. Both represent character better than all other feature extractors like color, edge, texture and etc. Upon picking up potential character candidates, these connected components based algorithms typically go through text line generation, candidates filtering and segmentation as pointed out by this survey [12]. As to many other computer vision tasks, the incorporation of Convolutional Neural Network (CNN) in localizing text is a very active research at the moment. Huang et al. [6] trained a character classifier to examine components generated by MSER, with the objective of improving the robustness of feature extraction process. Alongside this work, [17], [18] also trained a CNN to classify text components from nontext. This line of work demonstrated the high discriminative power of CNN as a feature extractor. However, interestingly, Zhang et al. [9] argued that leveraging on CNN as a character detector has restricted the CNN s potential due to the local nature of characters. Zhang et al. trained two Fully Convolutional Networks (FCN) [19]: 1) A Text-Block FCN that considers both local and global contextual info at the same time to identify text regions in an image, 2) Character-Centroid FCN to eliminate false text line candidates. However, text line generation, which plays a key role in grouping characters into a word, did not receive much benefit from the robust CNN. While most of the algorithms [9], [18] handcrafted the text line generation process, He et al. [10] trained a FCN to infer text line candidates. By cascading a text region and a text line using supervised FCN, Cascaded Convolution Text Network (CCTN) achieved generalization in terms of text orientations, and is one of the best performing system in both horizontal and abritrary oriented scene text datasets: ICDAR 2013 and MSRA-TD500. III. T OTAL -T EXT DATASET This section will discuss a) the motivation of collecting Total-Text; b) observation made on horizontal, multioriented, and curved text; c) orientation assumption aspect in the current state-of-the-art algorithms, and d) different aspects and statistics of Total-Text. A. Dataset Attributes Curved text is an overlooked problem. The effort of collecting this dataset is motivated by the missing of curved text in existing scene text datasets. Curved text can be easily
3 Figure 3: 1st row: Examples from ICDAR 2013, ICDAR2015 and MSRA-TD500; 2nd row: Slightly curved to extremely curved text examples from the Total-Text. (a) Yin et. al. [22] (red bounding box) and Huang et al. [6] (blue bounding box) (b) Shi et al. [7] Figure 4: These show that the current state-of-the-art solutions could not detect curved text effectively. found in real life scenes such as: business logos, signs, entrances etc as depicted in Fig. 7d, surprisingly such data has close to zero existence in the current datasets [1], [2], [13]. The most popular scene text dataset over the decade, ICDARs have only horizontal text [12]. Consequently, vast majority of algorithms assume text linearity to tackle the problem effectively. As a result of overwhelming attention, performances of text detections in ICDARs are saturated at quite a high point (0.9 in terms of f-score). Meanwhile, multi-oriented text also received a certain amount of attention from this community. MSRA-TD500 is a well known dataset that introduced this challenge to the field. Algorithms like [9], [20] were designed to cater multi-oriented text. To the best of our knowledge, scene text detection algorithms designed for curved orientation [8] in consideration is relatively unpopular. We believe that the lack of such dataset is the obvious reason why the community has overlooked it. Hence, we propose Total-Text with 4,265 curved text out of 9,330 total text instances, hoping to spur an interest in the community to address curved text. Curved text observation. Geometrically speaking, a straight line has no angle variation along the line, and thus can be described as a linear function, y = mx + c. A curved line is not a straight line. It is free of angle variation restriction throughout the line. Shifting to the scene text perspective, we observed that horizontal oriented text or word is a series of characters that can be connected by a straight line; their bottom alignment in particular for most cases. At the same time, multi-oriented text, in scene text convention, can also be connected by a straight line, given an offset with respect to a horizontal line. Meanwhile, characters a in curved word will not have unified angle offset, in which deemed to fit a polynomial line in text level (refer to Fig. 3 for image examples). In our dataset collection, we found out that curved text in natural images could vary from slightly curved to extremely curved. Also, it is not surprising to find that most of them are in the shape of a symmetric arc due to the symmetrical preferences in human vision [21]. Orientation assumption. We observed that orientation assumption is a must in a lot of algorithms [3] [6], [9], [20]. We took a closer look into the orientation assumption aspect of existing text detection algorithms and see how it fits into the observation we have made on the curved text. We mainly focused on systems in which the authors claimed to have multi-oriented text detection capability and reported their results on MSRA-TD500. Zhang et al. [9] first used the FCN to create a saliency map and generate text blocks. Consequently, the system draw a straight line from the middle point of the generated text blocks, aiming to hit as many character components as possible; the straight line with the angle offset that hit the most text blocks will be considered as text line for the subsequent step. We believe that such mechanism would not work in our dataset, as a straight line would miss the polynomial nature of curved text. [20] focused on the text candidate construction part to detect multi-oriented text. Their algorithm will first clusters character pairs with consistent orientation or perspective
4 Figure 5: Comparison between conventional rectangular bounding box (red colour) and the proposed polygon-shaped bounding region (green colour) in Total-Text. Polygon-shaped appeared to be the better candidate for groundtruth. (a) Various text orientations (from left to right). Top (One orientation): HC; VC; Cir and W. Middle (Two orientations): Cir+H; MO+HC; W+H. Bottom (Three orientations): H+MO+VC; H+MO+HC; H+MO+Cir (b) Various text fonts and image backgrounds Figure 6: Total-Text dataset is challenging due to its highly diversified orientation compositions and scenery. Legends: H=horizontal, MO=multi-oriented, HC=horizontal curve, VC=vertical curve, Cir=circular and W=Wavy. view into the same group. As we can see in Fig. 3 (second row, second and third image specifically), characters in a single curved word could have multiple variations in terms of orientation. In fact, both of these algorithms, along with [7], have reported their failure on the same curved text images in MSRA-TD500 as illustrated in Fig. 4b. It is worth to note that MSRA-TD500 has only 2 curved text instances in the entire dataset. Last but not least, we ran [22] and [6] on several images of Total-Text, results can be seen in Fig. 4. Focused scene text as a start. Two of the latest scene text datasets, COCO-text and ICDAR 2015 emerged to challenge current algorithms with incidental images. For example, scene images in the ICDAR 2015 [13] were captured without prior effort in positioning the text in it. Although it was not mentioned explicitly, one can deduce the emergence of these datasets are possibly due to: i) Performances of various algorithms on previous ICDARs dataset have saturated at a rather high point, hence a new dataset with higher level of complexity is deem required, ii) Well focused scene text are not likely to be captured by devices in real world scenarios. While the work done in curved text detection is considerably rare, we believe that it is at its infant stage. Inspired by the improvement in scene text detection and recognition brought by focused scene text datasets, notably ICDARs, and MSRA-TD500, we believe that focused scene text instead of incidental scene text is more appropriate to kick start related research work. Tighter groundtruth is better. ICDAR 2015 employed quadrilaterals in its annotation to cater perspective distorted text [13]. However, COCO-text used rectangular bounding boxes [14] like ICDAR 2013, which we think is a poor choice considering the text orientation variations in it. Fig. 5 illustrates the downside of such bounding box annotation. Text regions cover much of the background which is not an ideal groundtruth for both evaluation and training. In Total-Text, we annotated the text region with polygon shapes that fits tightly, and the groundtruth is provided in polygon vertices format. Evaluation Protocol. Like ICDARs datasets [12], TotalText uses DetEval [23]. We did a modication to the minimum intersection area calculation stage to handle our polygonshaped groundtruth. The evaluation protocol will be made
5 (a) Text instances per image (b) Text orientations per image (c) Curve variations (d) Occurrence of curved text Figure 7: Statistics of Total-Text dataset Figure 8: Examples of pixel-level annotation (cropped) in Total-Text. available as well. Annotation Details. Groundtruth in the Total-Text is annotated in word level granularity. Adopted from the COCO-text, word level texts are uninterrupted sequence of characters separated by a space. As mentioned, Total- Text uses polygon shapes to bind groundtruth words tightly. Apart from that, we also included rectangular bounding box annotation considering most of the current algorithms generate rectangule bounding box outputs. However, it is not an accurate representation as a big chunk of background area is included due to the nature of curved text. Therefore, we do not encourage the usage of rectangular bounding box in our dataset. Total-Text considers only English characters in natural images; other languages, digital watermarks and unreadable texts are labelled as do not care in the groundtruth. Do not care area picks up by algorithms should be filtered out before evaluating its performance. Groundtruth for word recognition is also provided along with its spatial coordinates. In addition, orientation of every instances were annotated for modularity convenience. For example, if one prefer to evaluate curved text detection ability only, one could leverage this annotation to filter out intances with other orientations. Last but not least, Total- Text also comes with binary mask groundtruth to cater the recent requirements [9] [11]. Fig. 1 illustrates all the aforementioned annotation details apart from the pixel-level annotation, which is illustrated in Fig. 8. Considering the scale of this dataset is manageable, authors of this paper annotated the entire dataset manually and cross checked with another 3 laboratory members. B. Dataset Statistics This subsection will discuss the statistics of Total-Text. All of the comparisons are made against ICDAR 2013 and MSRA-TD500, as they are the most common benchmark for horizontal and multi-oriented focused scene text respectively. Total-Text is split into two groups, training and testing set with 1255 and 300 images, respectively. Strength in numbers. Fig. 7 shows a series of statistics information of the Total-Text. It has a total of 9330 annotated texts, 6 instances per image in average. More than half of the images in Total-Text have 2 different orientations and above, yielding 1.8 orientations per image on average. Both numbers ranked first against its competitors [12], showing the complexity of Total-Text. Apart from these solid numbers, the dataset was also collected with quality in mind, including scene complexity such as text-like and low contrast background, different font types and sizes, etc, image examples in Fig. 6b. Orientation diversity. Approximate by half of the text instances are curved, and the other half is split almost equally between horizontal and multi-oriented. Curve text has its own variation too. Based on our observation, we classified them as horizontal curved, vertical curved, circular, and wavy (refer to 6a for image example). Their composition in the dataset can be seen in 7c. Although all the images were collected with curved text in mind, other orientations still occupy half of the total instances. A closer look into the dataset shows that curved text usually appears with either horizontal or multi-oriented texts. The mixture of orientations in an image, challenges text detection algorithms to achieve robustness and generalization in terms of text
6 Figure 9: Visualization of the activations in deconvolution network. The activation maps from top left to bottom right correspond to the output maps from lower to higher layers in the deconvolution network. We select the most representative activation in each layer for effective visualization. (a) Input image; (b) the last deconvolutional layer; (c) the unpooling layer; (d) the last deconvolutional layer; (e) the unpooling layer; (f) the last deconvolutional layer; (g) the unpooling layer; (h) the last deconvolutional layer; (i) the unpooling layer and (j) the last deconvolutional layer. orientations. Scene diversity. In comparison to CUTE80 (the only publicly available curved text dataset), which majority of the images are football jerseys, Total-Text is much more diversified. Fig. 7d shows where curved text usually appears. Business related places like restaurant (i.e., Nandos, Starbucks), company branding logos, and merchant stores take up of 61.2% of the curved text instances. Tourist spots such as park (i.e., Beverly Hills in America), museums and landmarks (i.e., Harajuku in Japan) occupy 21.1%. Fig. 2 illustrates these examples. IV. S EMANTIC SEGMENTATION FOR TEXT DETECTION Inspired by the success of FCN in the semantic segmentation problem, [9] [11] casted text detection as a segmentation problem, and achieved state-of-the-art results. While most of the conventional algorithms failed in detecting curved text, their algorithms have shown successful results in limited number of examples due to the lack of available benchmark. The fact that [10] achieved good results without any heuristic grouping rules where most of the other algorithms need, intrigued us to look into this new breed of solution. We fine-tuned DeconvNet [24] and evaluated it on Total-Text, following section will discuss our findings. A. DeconvNet We select DeconvNet [24] as our investigation tool due to two reasons: 1) it achieved state-of-the-art results in semantic segmentation on Pascal VOC dataset and 2) Multiple deconvolutional layers in the DeconvNet allow us to observe the deviation finely. The scope of this paper is not proposing a new solution to solve the curved text problem, hence we merely convert and fine-tune the network to localize texts. For complete understanding, readers are encouraged to read [24]. Conversion. The last convolution layer of the original DeconvNet has 21 layers for 20 classes in the PASCAL VOC benchmark [25] and one background class. In this paper, we reduced it to two layers, representing text and non-text. Then, we fine-tuned the pre-trained model provided by Noh et al. [24] with one step training process instead of two as discussed in the original paper. Apart from these and the training data, all other training implementations were consistent with the original paper. Training Data. Considering the depth of DeconvNet (i.e., 29 convolutional layers and 252M parameters), we pretrained it using the largest scene text dataset, COCO-text [14]. Images in the COCO-text were categorized into legible and illegible text, where we trained our network only on the legible text as it closely resemble our dataset. Similar to [9], [10], we first generated the binary mask with 1 indicating text region and 0 for background. Approximately 15k of training data were cropped into 256x256 patches to cater the receptive field of the DeconvNet. Patches with less than 10% text regions were eliminated to prevent overwhelming amount of non-text data. Roughly 200k and 80k patches of training and validation data were generated, respectively. We augmented the data in parallel to the training with horizontal flipping and random cropping (into 224x224). B. Experiments Inference. The inference process was kept to be as simple as possible. We resized input images to , then forward propagated them through the DeconvNet. To generate final detection result, the saliency map was binarized using a threshold of 0.5, followed by connected component
7 Table I: Evaluation of DeconvNet on Total-Text. Dataset Total-Text Recall 0.33 Precision 0.40 F-score 0.36 Figure 10: Successful examples of DeconvNet. Figure 12: Examples of DeconvNet with lower confidence at both end of the curved text. Figure 11: Failure examples of DeconvNet. analysis to group 1s (text) pixels and bound them tightly with polygons. Results. The outcomes were evaluated using our evaluation protocol and listed in Table I. As we went through each of the output saliency maps, we found two consistent roots that cause such unsatisfactory results: 1) The network is not robust enough for challenging backgrounds such as texts attached on repeated patterns such as bricks, gate, wall, etc.; 2) Multiple word candidates were grouped as one. Fig. 11 illustrates some failure examples. We suspect the robustness of the network was affected by its training data. Such loosely bounded training data with background regions labelled as text could have impacted the training process to a certain extend. Meanwhile, producing word line level output is commonly seen in text detection algorithms, we lack of a segmentation process to separate them into words level. Deeper look into the network. As mentioned before, our primary intention were to investigate the performance of DeconvNet on text with all sorts of orientations. With no orientation assumption or any heuristic grouping mechanism in the design, we managed to find candidates across texts with all orientations as illustrated in Fig. 10. We were curious on how and what exactly happened across the deconvolution network. So, we cropped a specific patch of an original image that consists of curved text, forward propagated through the network, and observed the feature maps in several layers of the deconvolution network. As we can see in Fig. 9, at the lower layers, we can notice which part of the feature map is highly activated. As the layers proceed, finer details emerged, enriching the region of interest to an extend that we can recognized the characters in it. Spatial resolution of feature maps is crucial. Text detection systems like [9], [10] adopted FCN and skip connections in their Convolutional Network. Such design element perserves spatial resolution of feature maps, and in turn provides better contextual information for their pixelwise prediction task. Similarly, DeconvNet uses a combination of both unpooling layers and learn-able upsampling convolution filters to infer bigger feature maps layer after layer. As we can see in Fig. 10, such saliency map is high in resolution, depicts the actual shape or orientation of the detected text region. Minimal post-processing steps are required to retrieve text candidates from it. Text line supervision is an interesting step forward. Fig. 12 illustrates several examples where the network is not confident about the shape of the curved text regions. We believe that it could be improved with text line supervision leveraged in [10]. This can be noticed in [10], where the work showed their results without the FTN, its performance droped from 0.84 to 0.5 in terms of F-score. V. C ONCLUSION This paper introduces a comprehensive scene text dataset, Total-Text, featuring the missing element in current scene text datasets - curved text. We believe that curved text should be included as part of the multi-oriented text detection problem. While it is under research at the moment, we hope the availability of Total-Text could change the scene. We fine-tuned and analyzed how DeconvNet responds to curved text. Spatial resolution of feature maps and contextual
8 information appeared to be crucial in segmentation based methods. Such methods are capable of predicting text regions in all sorts of orientations without hard-coded rules. Inspired by this observation, we plan to explore this area further with the aim of designing a scene text detect that is effective against multi-oriented text. ACKNOWLEDGMENT This work is partly supported by Postgraduate Research Grant (PPP) - PG A, from University of Malaya. The Titan-X GPU used by this research was donated by NVIDIA Corporation. We would also like to express our gratitude towards Jia Huei Tan, Yang Loong Chang and Yuen Peng Loh for Total-Text image collection and annotation. REFERENCES [1] D. Karatzas, F. Shafait, S. Uchida, M. Iwamura, L. G. i Bigorda, S. R. Mestre, J. Mas, D. F. Mota, J. A. Almazan, and L. P. de las Heras, Icdar 2013 robust reading competition, in ICDAR, [2] C. Yao, X. Bai, W. Liu, Y. Ma, and Z. Tu, Detecting texts of arbitrary orientations in natural images, in CVPR, [3] Z. Zhang, W. Shen, C. Yao, and X. Bai, Symmetry-based text line detection in natural scenes, in CVPR, [4] W. Huang, Z. Lin, J. Yang, and J. Wang, Text localization in natural images using stroke feature transform and text covariance descriptors, in ICCV, [5] L. Neumann and J. Matas, Scene text localization and recognition with oriented stroke detection, in ICCV, [6] W. Huang, Y. Qiao, and X. Tang, Robust scene text detection with convolution neural network induced mser trees, in ECCV, [7] B. Shi, X. Bai, and S. Belongie, Detecting oriented text in natural images by linking segments, in CVPR, [8] A. Risnumawan, P. Shivakumara, C. S. Chan, and C. L. Tan, A robust arbitrary text detection system for natural scene images, Expert Systems with Applications, vol. 41, no. 18, pp , [9] Z. Zhang, C. Zhang, W. Shen, C. Yao, W. Liu, and X. Bai, Multi-oriented text detection with fully convolutional networks, in CVPR, [13] D. Karatzas, L. Gomez-Bigorda, A. Nicolaou, S. Ghosh, A. Bagdanov, M. Iwamura, J. Matas, L. Neumann, V. R. Chandrasekhar, S. Lu et al., Icdar 2015 competition on robust reading, in ICDAR, [14] V. Andreas, M. Tomas, N. Lukas, M. Jiri, and B. Serge, Coco-text: Dataset and benchmark for text detection and recognition in natural images, arxiv preprint arxiv: , [15] B. Epshtein, E. Ofek, and Y. Wexler, Detecting text in natural scenes with stroke width transform, in CVPR, [16] J. Matas, O. Chum, M. Urban, and T. Pajdla, Robust widebaseline stereo from maximally stable extremal regions, Image and Vision Computing, vol. 22, no. 10, pp , [17] T. Wang, D. J. Wu, A. Coates, and A. Y. Ng, End-to-end text recognition with convolutional neural networks, in ICPR, [18] M. Jaderberg, A. Vedaldi, and A. Zisserman, Deep features for text spotting, in ECCV, [19] J. Long, E. Shelhamer, and T. Darrell, Fully convolutional networks for semantic segmentation, in CVPR, [20] X.-C. Yin, W.-Y. Pei, J. Zhang, and H.-W. Hao, Multiorientation scene text detection with adaptive clustering, T- PAMI, vol. 37, no. 9, pp , [21] R. B. Adams, The Science of Social Vision: The Science of Social Vision. Oxford University Press, 2011, vol. 7. [22] X.-C. Yin, X. Yin, K. Huang, and H.-W. Hao, Robust text detection in natural scene images, T-PAMI, vol. 36, no. 5, pp , [23] C. Wolf and J.-M. Jolion, Object count/area graphs for the evaluation of object detection and segmentation algorithms, IJDAR, vol. 8, no. 4, pp , [24] H. Noh, S. Hong, and B. Han, Learning deconvolution network for semantic segmentation, in ICCV, [25] M. Everingham, L. Van Gool, C. K. Williams, J. Winn, and A. Zisserman, The pascal visual object classes (voc) challenge, International Journal of Computer Vision, vol. 88, no. 2, pp , [10] T. He, W. Huang, Y. Qiao, and J. Yao, Accurate text localization in natural image with cascaded convolutional text network, arxiv preprint arxiv: , [11] C. Yao, X. Bai, N. Sang, X. Zhou, S. Zhou, and Z. Cao, Scene text detection via holistic, multi-channel prediction, arxiv preprint arxiv: , [12] Q. Ye and D. Doermann, Text detection and recognition in images and video : a survey, T-PAMI, vol. 37, no. 7, pp. 1 20, 2014.
Arbitrary-Oriented Scene Text Detection via Rotation Proposals
1 Arbitrary-Oriented Scene Text Detection via Rotation Proposals Jianqi Ma, Weiyuan Shao, Hao Ye, Li Wang, Hong Wang, Yingbin Zheng, Xiangyang Xue arxiv:1703.01086v1 [cs.cv] 3 Mar 2017 Abstract This paper
More informationMulti-Oriented Text Detection with Fully Convolutional Networks
Multi-Oriented Text Detection with Fully Convolutional Networks Zheng Zhang 1 Chengquan Zhang 1 Wei Shen 2 Cong Yao 1 Wenyu Liu 1 Xiang Bai 1 1 School of Electronic Information and Communications, Huazhong
More informationWeText: Scene Text Detection under Weak Supervision
WeText: Scene Text Detection under Weak Supervision Shangxuan Tian 1, Shijian Lu 2, and Chongshou Li 3 1 Visual Computing Department, Institute for Infocomm Research 2 School of Computer Science and Engineering,
More informationSegmentation Framework for Multi-Oriented Text Detection and Recognition
Segmentation Framework for Multi-Oriented Text Detection and Recognition Shashi Kant, Sini Shibu Department of Computer Science and Engineering, NRI-IIST, Bhopal Abstract - Here in this paper a new and
More informationarxiv: v1 [cs.cv] 12 Sep 2016
arxiv:1609.03605v1 [cs.cv] 12 Sep 2016 Detecting Text in Natural Image with Connectionist Text Proposal Network Zhi Tian 1, Weilin Huang 1,2, Tong He 1, Pan He 1, and Yu Qiao 1,3 1 Shenzhen Key Lab of
More informationarxiv: v1 [cs.cv] 4 Jan 2018
PixelLink: Detecting Scene Text via Instance Segmentation Dan Deng 1,3, Haifeng Liu 1, Xuelong Li 4, Deng Cai 1,2 1 State Key Lab of CAD&CG, College of Computer Science, Zhejiang University 2 Alibaba-Zhejiang
More informationarxiv: v1 [cs.cv] 4 Mar 2017
Deep Matching Prior Network: Toward Tighter Multi-oriented Text Detection Yuliang Liu, Lianwen Jin+ College of Electronic Information Engineering South China University of Technology +lianwen.jin@gmail.com
More informationREGION AVERAGE POOLING FOR CONTEXT-AWARE OBJECT DETECTION
REGION AVERAGE POOLING FOR CONTEXT-AWARE OBJECT DETECTION Kingsley Kuan 1, Gaurav Manek 1, Jie Lin 1, Yuan Fang 1, Vijay Chandrasekhar 1,2 Institute for Infocomm Research, A*STAR, Singapore 1 Nanyang Technological
More informationarxiv: v1 [cs.cv] 1 Sep 2017
Single Shot Text Detector with Regional Attention Pan He1, Weilin Huang2, 3, Tong He3, Qile Zhu1, Yu Qiao3, and Xiaolin Li1 arxiv:1709.00138v1 [cs.cv] 1 Sep 2017 1 National Science Foundation Center for
More informationLecture 7: Semantic Segmentation
Semantic Segmentation CSED703R: Deep Learning for Visual Recognition (207F) Segmenting images based on its semantic notion Lecture 7: Semantic Segmentation Bohyung Han Computer Vision Lab. bhhanpostech.ac.kr
More informationarxiv: v1 [cs.cv] 23 Apr 2016
Text Flow: A Unified Text Detection System in Natural Scene Images Shangxuan Tian1, Yifeng Pan2, Chang Huang2, Shijian Lu3, Kai Yu2, and Chew Lim Tan1 arxiv:1604.06877v1 [cs.cv] 23 Apr 2016 1 School of
More informationPhoto OCR ( )
Photo OCR (2017-2018) Xiang Bai Huazhong University of Science and Technology Outline VALSE2018, DaLian Xiang Bai 2 Deep Direct Regression for Multi-Oriented Scene Text Detection [He et al., ICCV, 2017.]
More informationAn Efficient Method to Extract Digital Text From Scanned Image Text
An Efficient Method to Extract Digital Text From Scanned Image Text Jenick Johnson ECE Dept., Christ the King Engineering College Coimbatore-641104, Tamil Nadu, India Suresh Babu. V ECE Dept., Christ the
More informationarxiv: v2 [cs.cv] 10 Jul 2017
EAST: An Efficient and Accurate Scene Text Detector Xinyu Zhou, Cong Yao, He Wen, Yuzhi Wang, Shuchang Zhou, Weiran He, and Jiajun Liang Megvii Technology Inc., Beijing, China {zxy, yaocong, wenhe, wangyuzhi,
More informationAccurate Scene Text Detection through Border Semantics Awareness and Bootstrapping
Accurate Scene Text Detection through Border Semantics Awareness and Bootstrapping Chuhui Xue [0000 0002 3562 3094], Shijian Lu [0000 0002 6766 2506], and Fangneng Zhan [0000 0003 1502 6847] School of
More informationFeature Fusion for Scene Text Detection
2018 13th IAPR International Workshop on Document Analysis Systems Feature Fusion for Scene Text Detection Zhen Zhu, Minghui Liao, Baoguang Shi, Xiang Bai School of Electronic Information and Communications
More informationChannel Locality Block: A Variant of Squeeze-and-Excitation
Channel Locality Block: A Variant of Squeeze-and-Excitation 1 st Huayu Li Northern Arizona University Flagstaff, United State Northern Arizona University hl459@nau.edu arxiv:1901.01493v1 [cs.lg] 6 Jan
More informationTowards Visual Words to Words
Towards Visual Words to Words Text Detection with a General Bag of Words Representation Rakesh Mehta Dept. of Signal Processing, Tampere Univ. of Technology in Tampere Ondřej Chum, Jiří Matas Centre for
More informationTextSnake: A Flexible Representation for Detecting Text of Arbitrary Shapes
TextSnake: A Flexible Representation for Detecting Text of Arbitrary Shapes Shangbang Long 1,2[0000 0002 4089 5369], Jiaqiang Ruan 1,2, Wenjie Zhang 1,2,,Xin He 2, Wenhao Wu 2, Cong Yao 2[0000 0001 6564
More informationTRANSPARENT OBJECT DETECTION USING REGIONS WITH CONVOLUTIONAL NEURAL NETWORK
TRANSPARENT OBJECT DETECTION USING REGIONS WITH CONVOLUTIONAL NEURAL NETWORK 1 Po-Jen Lai ( 賴柏任 ), 2 Chiou-Shann Fuh ( 傅楸善 ) 1 Dept. of Electrical Engineering, National Taiwan University, Taiwan 2 Dept.
More informationAvailable online at ScienceDirect. Procedia Computer Science 96 (2016 )
Available online at www.sciencedirect.com ScienceDirect Procedia Computer Science 96 (2016 ) 1409 1417 20th International Conference on Knowledge Based and Intelligent Information and Engineering Systems,
More informationContent-Based Image Recovery
Content-Based Image Recovery Hong-Yu Zhou and Jianxin Wu National Key Laboratory for Novel Software Technology Nanjing University, China zhouhy@lamda.nju.edu.cn wujx2001@nju.edu.cn Abstract. We propose
More informationSSD: Single Shot MultiBox Detector. Author: Wei Liu et al. Presenter: Siyu Jiang
SSD: Single Shot MultiBox Detector Author: Wei Liu et al. Presenter: Siyu Jiang Outline 1. Motivations 2. Contributions 3. Methodology 4. Experiments 5. Conclusions 6. Extensions Motivation Motivation
More informationarxiv: v1 [cs.cv] 2 Jan 2019
Detecting Text in the Wild with Deep Character Embedding Network Jiaming Liu, Chengquan Zhang, Yipeng Sun, Junyu Han, and Errui Ding Baidu Inc, Beijing, China. {liujiaming03,zhangchengquan,yipengsun,hanjunyu,dingerrui}@baidu.com
More informationEfficient Segmentation-Aided Text Detection For Intelligent Robots
Efficient Segmentation-Aided Text Detection For Intelligent Robots Junting Zhang, Yuewei Na, Siyang Li, C.-C. Jay Kuo University of Southern California Outline Problem Definition and Motivation Related
More informationScene Text Detection Using Machine Learning Classifiers
601 Scene Text Detection Using Machine Learning Classifiers Nafla C.N. 1, Sneha K. 2, Divya K.P. 3 1 (Department of CSE, RCET, Akkikkvu, Thrissur) 2 (Department of CSE, RCET, Akkikkvu, Thrissur) 3 (Department
More informationSupplementary Material: Unconstrained Salient Object Detection via Proposal Subset Optimization
Supplementary Material: Unconstrained Salient Object via Proposal Subset Optimization 1. Proof of the Submodularity According to Eqns. 10-12 in our paper, the objective function of the proposed optimization
More informationarxiv: v2 [cs.cv] 27 Feb 2018
Multi-Oriented Scene Text Detection via Corner Localization and Region Segmentation arxiv:1802.08948v2 [cs.cv] 27 Feb 2018 Pengyuan Lyu 1, Cong Yao 2, Wenhao Wu 2, Shuicheng Yan 3, Xiang Bai 1 1 Huazhong
More informationDetecting Printed and Handwritten Partial Copies of Line Drawings Embedded in Complex Backgrounds
9 1th International Conference on Document Analysis and Recognition Detecting Printed and Handwritten Partial Copies of Line Drawings Embedded in Complex Backgrounds Weihan Sun, Koichi Kise Graduate School
More informationStructured Prediction using Convolutional Neural Networks
Overview Structured Prediction using Convolutional Neural Networks Bohyung Han bhhan@postech.ac.kr Computer Vision Lab. Convolutional Neural Networks (CNNs) Structured predictions for low level computer
More informationYOLO9000: Better, Faster, Stronger
YOLO9000: Better, Faster, Stronger Date: January 24, 2018 Prepared by Haris Khan (University of Toronto) Haris Khan CSC2548: Machine Learning in Computer Vision 1 Overview 1. Motivation for one-shot object
More informationLatest development in image feature representation and extraction
International Journal of Advanced Research and Development ISSN: 2455-4030, Impact Factor: RJIF 5.24 www.advancedjournal.com Volume 2; Issue 1; January 2017; Page No. 05-09 Latest development in image
More informationarxiv: v1 [cs.cv] 31 Mar 2016
Object Boundary Guided Semantic Segmentation Qin Huang, Chunyang Xia, Wenchao Zheng, Yuhang Song, Hao Xu and C.-C. Jay Kuo arxiv:1603.09742v1 [cs.cv] 31 Mar 2016 University of Southern California Abstract.
More informationDeep Direct Regression for Multi-Oriented Scene Text Detection
Deep Direct Regression for Multi-Oriented Scene Text Detection Wenhao He 1,2 Xu-Yao Zhang 1 Fei Yin 1 Cheng-Lin Liu 1,2 1 National Laboratory of Pattern Recognition (NLPR) Institute of Automation, Chinese
More informationABSTRACT 1. INTRODUCTION 2. RELATED WORK
Improving text recognition by distinguishing scene and overlay text Bernhard Quehl, Haojin Yang, Harald Sack Hasso Plattner Institute, Potsdam, Germany Email: {bernhard.quehl, haojin.yang, harald.sack}@hpi.de
More informationAggregating Local Context for Accurate Scene Text Detection
Aggregating Local Context for Accurate Scene Text Detection Dafang He 1, Xiao Yang 2, Wenyi Huang, 1, Zihan Zhou 1, Daniel Kifer 2, and C.Lee Giles 1 1 Information Science and Technology, Penn State University
More informationA process for text recognition of generic identification documents over cloud computing
142 Int'l Conf. IP, Comp. Vision, and Pattern Recognition IPCV'16 A process for text recognition of generic identification documents over cloud computing Rodolfo Valiente, Marcelo T. Sadaike, José C. Gutiérrez,
More informationSupplementary Material: Pixelwise Instance Segmentation with a Dynamically Instantiated Network
Supplementary Material: Pixelwise Instance Segmentation with a Dynamically Instantiated Network Anurag Arnab and Philip H.S. Torr University of Oxford {anurag.arnab, philip.torr}@eng.ox.ac.uk 1. Introduction
More informationAn Approach to Detect Text and Caption in Video
An Approach to Detect Text and Caption in Video Miss Megha Khokhra 1 M.E Student Electronics and Communication Department, Kalol Institute of Technology, Gujarat, India ABSTRACT The video image spitted
More informationReal-time Object Detection CS 229 Course Project
Real-time Object Detection CS 229 Course Project Zibo Gong 1, Tianchang He 1, and Ziyi Yang 1 1 Department of Electrical Engineering, Stanford University December 17, 2016 Abstract Objection detection
More informationAn Anchor-Free Region Proposal Network for Faster R-CNN based Text Detection Approaches
An Anchor-Free Region Proposal Network for Faster R-CNN based Text Detection Approaches Zhuoyao Zhong 1,2,*, Lei Sun 2, Qiang Huo 2 1 School of EIE., South China University of Technology, Guangzhou, China
More informationWordSup: Exploiting Word Annotations for Character based Text Detection
WordSup: Exploiting Word Annotations for Character based Text Detection Han Hu 1 Chengquan Zhang 2 Yuxuan Luo 2 Yuzhuo Wang 2 Junyu Han 2 Errui Ding 2 Microsoft Research Asia 1 IDL, Baidu Research 2 hanhu@microsoft.com
More informationScene Text Recognition for Augmented Reality. Sagar G V Adviser: Prof. Bharadwaj Amrutur Indian Institute Of Science
Scene Text Recognition for Augmented Reality Sagar G V Adviser: Prof. Bharadwaj Amrutur Indian Institute Of Science Outline Research area and motivation Finding text in natural scenes Prior art Improving
More informationGeometry-aware Traffic Flow Analysis by Detection and Tracking
Geometry-aware Traffic Flow Analysis by Detection and Tracking 1,2 Honghui Shi, 1 Zhonghao Wang, 1,2 Yang Zhang, 1,3 Xinchao Wang, 1 Thomas Huang 1 IFP Group, Beckman Institute at UIUC, 2 IBM Research,
More informationCAP 6412 Advanced Computer Vision
CAP 6412 Advanced Computer Vision http://www.cs.ucf.edu/~bgong/cap6412.html Boqing Gong April 21st, 2016 Today Administrivia Free parameters in an approach, model, or algorithm? Egocentric videos by Aisha
More informationDeep Tracking: Biologically Inspired Tracking with Deep Convolutional Networks
Deep Tracking: Biologically Inspired Tracking with Deep Convolutional Networks Si Chen The George Washington University sichen@gwmail.gwu.edu Meera Hahn Emory University mhahn7@emory.edu Mentor: Afshin
More informationA FRAMEWORK OF EXTRACTING MULTI-SCALE FEATURES USING MULTIPLE CONVOLUTIONAL NEURAL NETWORKS. Kuan-Chuan Peng and Tsuhan Chen
A FRAMEWORK OF EXTRACTING MULTI-SCALE FEATURES USING MULTIPLE CONVOLUTIONAL NEURAL NETWORKS Kuan-Chuan Peng and Tsuhan Chen School of Electrical and Computer Engineering, Cornell University, Ithaca, NY
More informationText Detection and Extraction from Natural Scene: A Survey Tajinder Kaur 1 Post-Graduation, Department CE, Punjabi University, Patiala, Punjab India
Volume 3, Issue 3, March 2015 International Journal of Advance Research in Computer Science and Management Studies Research Article / Survey Paper / Case Study Available online at: www.ijarcsms.com ISSN:
More informationMULTI-SCALE OBJECT DETECTION WITH FEATURE FUSION AND REGION OBJECTNESS NETWORK. Wenjie Guan, YueXian Zou*, Xiaoqun Zhou
MULTI-SCALE OBJECT DETECTION WITH FEATURE FUSION AND REGION OBJECTNESS NETWORK Wenjie Guan, YueXian Zou*, Xiaoqun Zhou ADSPLAB/Intelligent Lab, School of ECE, Peking University, Shenzhen,518055, China
More informationAutomatically Algorithm for Physician s Handwritten Segmentation on Prescription
Automatically Algorithm for Physician s Handwritten Segmentation on Prescription Narumol Chumuang 1 and Mahasak Ketcham 2 Department of Information Technology, Faculty of Information Technology, King Mongkut's
More informationMulti-script Text Extraction from Natural Scenes
Multi-script Text Extraction from Natural Scenes Lluís Gómez and Dimosthenis Karatzas Computer Vision Center Universitat Autònoma de Barcelona Email: {lgomez,dimos}@cvc.uab.es Abstract Scene text extraction
More informationCS231A Course Project Final Report Sign Language Recognition with Unsupervised Feature Learning
CS231A Course Project Final Report Sign Language Recognition with Unsupervised Feature Learning Justin Chen Stanford University justinkchen@stanford.edu Abstract This paper focuses on experimenting with
More informationLEVERAGING SURROUNDING CONTEXT FOR SCENE TEXT DETECTION
LEVERAGING SURROUNDING CONTEXT FOR SCENE TEXT DETECTION Yao Li 1, Chunhua Shen 1, Wenjing Jia 2, Anton van den Hengel 1 1 The University of Adelaide, Australia 2 University of Technology, Sydney, Australia
More informationA Hierarchical Visual Saliency Model for Character Detection in Natural Scenes
A Hierarchical Visual Saliency Model for Character Detection in Natural Scenes Renwu Gao 1, Faisal Shafait 2, Seiichi Uchida 3, and Yaokai Feng 3 1 Information Sciene and Electrical Engineering, Kyushu
More informationDetecting and Recognizing Text in Natural Images using Convolutional Networks
Detecting and Recognizing Text in Natural Images using Convolutional Networks Aditya Srinivas Timmaraju, Vikesh Khanna Stanford University Stanford, CA - 94305 adityast@stanford.edu, vikesh@stanford.edu
More informationReading Text in the Wild from Compressed Images
Reading Text in the Wild from Compressed Images Leonardo Galteri leonardo.galteri@unifi.it Marco Bertini marco.bertini@unifi.it Dimosthenis Karatzas CVC, Barcelona dimos@cvc.uab.es Dena Bazazian CVC, Barcelona
More informationScene text recognition: no country for old men?
Scene text recognition: no country for old men? Lluís Gómez and Dimosthenis Karatzas Computer Vision Center Universitat Autònoma de Barcelona Email: {lgomez,dimos}@cvc.uab.es Abstract. It is a generally
More informationA Laplacian Based Novel Approach to Efficient Text Localization in Grayscale Images
A Laplacian Based Novel Approach to Efficient Text Localization in Grayscale Images Karthik Ram K.V & Mahantesh K Department of Electronics and Communication Engineering, SJB Institute of Technology, Bangalore,
More informationTextField: Learning A Deep Direction Field for Irregular Scene Text Detection
1 TextField: Learning A Deep Direction Field for Irregular Scene Text Detection Yongchao Xu, Yukang Wang, Wei Zhou, Yongpan Wang, Zhibo Yang, Xiang Bai, Senior Member, IEEE arxiv:1812.01393v1 [cs.cv] 4
More informationRobust Face Recognition Based on Convolutional Neural Network
2017 2nd International Conference on Manufacturing Science and Information Engineering (ICMSIE 2017) ISBN: 978-1-60595-516-2 Robust Face Recognition Based on Convolutional Neural Network Ying Xu, Hui Ma,
More informationDeepLab: Semantic Image Segmentation with Deep Convolutional Nets, Atrous Convolution and Fully Connected CRFs
DeepLab: Semantic Image Segmentation with Deep Convolutional Nets, Atrous Convolution and Fully Connected CRFs Zhipeng Yan, Moyuan Huang, Hao Jiang 5/1/2017 1 Outline Background semantic segmentation Objective,
More informationFast scene understanding and prediction for autonomous platforms. Bert De Brabandere, KU Leuven, October 2017
Fast scene understanding and prediction for autonomous platforms Bert De Brabandere, KU Leuven, October 2017 Who am I? MSc in Electrical Engineering at KU Leuven, Belgium Last year PhD student with Luc
More informationarxiv: v1 [cs.cv] 4 Dec 2017
Enhanced Characterness for Text Detection in the Wild Aarushi Agrawal 2, Prerana Mukherjee 1, Siddharth Srivastava 1, and Brejesh Lall 1 arxiv:1712.04927v1 [cs.cv] 4 Dec 2017 1 Department of Electrical
More informationarxiv: v1 [cs.cv] 16 Nov 2015
Coarse-to-fine Face Alignment with Multi-Scale Local Patch Regression Zhiao Huang hza@megvii.com Erjin Zhou zej@megvii.com Zhimin Cao czm@megvii.com arxiv:1511.04901v1 [cs.cv] 16 Nov 2015 Abstract Facial
More informationDeep learning for dense per-pixel prediction. Chunhua Shen The University of Adelaide, Australia
Deep learning for dense per-pixel prediction Chunhua Shen The University of Adelaide, Australia Image understanding Classification error Convolution Neural Networks 0.3 0.2 0.1 Image Classification [Krizhevsky
More informationRotation-sensitive Regression for Oriented Scene Text Detection
Rotation-sensitive Regression for Oriented Scene Text Detection Minghui Liao 1, Zhen Zhu 1, Baoguang Shi 1, Gui-song Xia 2, Xiang Bai 1 1 Huazhong University of Science and Technology 2 Wuhan University
More informationJOINT DETECTION AND SEGMENTATION WITH DEEP HIERARCHICAL NETWORKS. Zhao Chen Machine Learning Intern, NVIDIA
JOINT DETECTION AND SEGMENTATION WITH DEEP HIERARCHICAL NETWORKS Zhao Chen Machine Learning Intern, NVIDIA ABOUT ME 5th year PhD student in physics @ Stanford by day, deep learning computer vision scientist
More informationTEXTS in scenes contain high level semantic information
1 ESIR: End-to-end Scene Text Recognition via Iterative Rectification Fangneng Zhan and Shijian Lu arxiv:1812.05824v1 [cs.cv] 14 Dec 2018 Abstract Automated recognition of various texts in scenes has been
More informationarxiv: v1 [cs.cv] 22 Aug 2017
WordSup: Exploiting Word Annotations for Character based Text Detection Han Hu 1 Chengquan Zhang 2 Yuxuan Luo 2 Yuzhuo Wang 2 Junyu Han 2 Errui Ding 2 Microsoft Research Asia 1 IDL, Baidu Research 2 hanhu@microsoft.com
More informationReading Text in the Wild from Compressed Images
Reading Text in the Wild from Compressed Images Leonardo Galteri University of Florence leonardo.galteri@unifi.it Marco Bertini University of Florence marco.bertini@unifi.it Dimosthenis Karatzas CVC, Barcelona
More informationFully Convolutional Networks for Semantic Segmentation
Fully Convolutional Networks for Semantic Segmentation Jonathan Long* Evan Shelhamer* Trevor Darrell UC Berkeley Chaim Ginzburg for Deep Learning seminar 1 Semantic Segmentation Define a pixel-wise labeling
More informationText Extraction from Natural Scene Images and Conversion to Audio in Smart Phone Applications
Text Extraction from Natural Scene Images and Conversion to Audio in Smart Phone Applications M. Prabaharan 1, K. Radha 2 M.E Student, Department of Computer Science and Engineering, Muthayammal Engineering
More informationDetecting Oriented Text in Natural Images by Linking Segments
Detecting Oriented Text in Natural Images by Linking Segments Baoguang Shi 1 Xiang Bai 1 Serge Belongie 2 1 School of EIC, Huazhong University of Science and Technology 2 Department of Computer Science,
More information12/12 A Chinese Words Detection Method in Camera Based Images Qingmin Chen, Yi Zhou, Kai Chen, Li Song, Xiaokang Yang Institute of Image Communication
A Chinese Words Detection Method in Camera Based Images Qingmin Chen, Yi Zhou, Kai Chen, Li Song, Xiaokang Yang Institute of Image Communication and Information Processing, Shanghai Key Laboratory Shanghai
More informationTraffic Signs Recognition using HP and HOG Descriptors Combined to MLP and SVM Classifiers
Traffic Signs Recognition using HP and HOG Descriptors Combined to MLP and SVM Classifiers A. Salhi, B. Minaoui, M. Fakir, H. Chakib, H. Grimech Faculty of science and Technology Sultan Moulay Slimane
More informationCorrecting User Guided Image Segmentation
Correcting User Guided Image Segmentation Garrett Bernstein (gsb29) Karen Ho (ksh33) Advanced Machine Learning: CS 6780 Abstract We tackle the problem of segmenting an image into planes given user input.
More informationDETECTING TEXTUAL INFORMATION IN IMAGES FROM ONION DOMAINS USING TEXT SPOTTING
Actas de las XXXIX Jornadas de Automática, Badajoz, 5-7 de Septiembre de 2018 DETECTING TEXTUAL INFORMATION IN IMAGES FROM ONION DOMAINS USING TEXT SPOTTING Pablo Blanco Dept. IESA. Universidad de León,
More informationAndrei Polzounov (Universitat Politecnica de Catalunya, Barcelona, Spain), Artsiom Ablavatski (A*STAR Institute for Infocomm Research, Singapore),
WordFences: Text Localization and Recognition ICIP 2017 Andrei Polzounov (Universitat Politecnica de Catalunya, Barcelona, Spain), Artsiom Ablavatski (A*STAR Institute for Infocomm Research, Singapore),
More informationVerisimilar Image Synthesis for Accurate Detection and Recognition of Texts in Scenes
Verisimilar Image Synthesis for Accurate Detection and Recognition of Texts in Scenes Fangneng Zhan 1[0000 0003 1502 6847], Shijian Lu 2[0000 0002 6766 2506], and Chuhui Xue 3[0000 0002 3562 3094] School
More informationTHE SPEED-LIMIT SIGN DETECTION AND RECOGNITION SYSTEM
THE SPEED-LIMIT SIGN DETECTION AND RECOGNITION SYSTEM Kuo-Hsin Tu ( 塗國星 ), Chiou-Shann Fuh ( 傅楸善 ) Dept. of Computer Science and Information Engineering, National Taiwan University, Taiwan E-mail: p04922004@csie.ntu.edu.tw,
More informationScene Text Detection via Holistic, Multi-Channel Prediction
Scene Text Detection via Holistic, Multi-Channel Prediction Cong Yao1,2, Xiang Bai1, Nong Sang1, Xinyu Zhou2, Shuchang Zhou2, Zhimin Cao2 1 arxiv:1606.09002v1 [cs.cv] 29 Jun 2016 Huazhong University of
More informationSpeeding up the Detection of Line Drawings Using a Hash Table
Speeding up the Detection of Line Drawings Using a Hash Table Weihan Sun, Koichi Kise 2 Graduate School of Engineering, Osaka Prefecture University, Japan sunweihan@m.cs.osakafu-u.ac.jp, 2 kise@cs.osakafu-u.ac.jp
More informationarxiv: v3 [cs.cv] 2 Jun 2017
Incorporating the Knowledge of Dermatologists to Convolutional Neural Networks for the Diagnosis of Skin Lesions arxiv:1703.01976v3 [cs.cv] 2 Jun 2017 Iván González-Díaz Department of Signal Theory and
More informationMask R-CNN. presented by Jiageng Zhang, Jingyao Zhan, Yunhan Ma
Mask R-CNN presented by Jiageng Zhang, Jingyao Zhan, Yunhan Ma Mask R-CNN Background Related Work Architecture Experiment Mask R-CNN Background Related Work Architecture Experiment Background From left
More informationLayout Segmentation of Scanned Newspaper Documents
, pp-05-10 Layout Segmentation of Scanned Newspaper Documents A.Bandyopadhyay, A. Ganguly and U.Pal CVPR Unit, Indian Statistical Institute 203 B T Road, Kolkata, India. Abstract: Layout segmentation algorithms
More informationYiqi Yan. May 10, 2017
Yiqi Yan May 10, 2017 P a r t I F u n d a m e n t a l B a c k g r o u n d s Convolution Single Filter Multiple Filters 3 Convolution: case study, 2 filters 4 Convolution: receptive field receptive field
More informationTextBoxes++: A Single-Shot Oriented Scene Text Detector
1 TextBoxes++: A Single-Shot Oriented Scene Text Detector Minghui Liao, Baoguang Shi, Xiang Bai, Senior Member, IEEE arxiv:1801.02765v3 [cs.cv] 27 Apr 2018 Abstract Scene text detection is an important
More informationICDAR 2013 Robust Reading Competition
ICDAR 2013 Robust Reading Competition D. Karatzas, F. Shafait, S. Uchida, M. Iwamura, L. Gomez, S. Robles, J. Mas, D. Fernandez, J. Almazán, L.P. de las Heras http://dag.cvc.uab.es/icdar2013competition/
More informationTHE automated understanding of textual information in
MANUSCRIPT PREPRINT, JULY 2014 1 A Fast Hierarchical Method for Multi-script and Arbitrary Oriented Scene Text Extraction Lluis Gomez, and Dimosthenis Karatzas, Member, IEEE arxiv:1407.7504v1 [cs.cv] 28
More informationA Study of Vehicle Detector Generalization on U.S. Highway
26 IEEE 9th International Conference on Intelligent Transportation Systems (ITSC) Windsor Oceanico Hotel, Rio de Janeiro, Brazil, November -4, 26 A Study of Vehicle Generalization on U.S. Highway Rakesh
More informationImproving Latent Fingerprint Matching Performance by Orientation Field Estimation using Localized Dictionaries
Available Online at www.ijcsmc.com International Journal of Computer Science and Mobile Computing A Monthly Journal of Computer Science and Information Technology IJCSMC, Vol. 3, Issue. 11, November 2014,
More informationTranslation Symmetry Detection: A Repetitive Pattern Analysis Approach
2013 IEEE Conference on Computer Vision and Pattern Recognition Workshops Translation Symmetry Detection: A Repetitive Pattern Analysis Approach Yunliang Cai and George Baciu GAMA Lab, Department of Computing
More informationSupplementary Material for Zoom and Learn: Generalizing Deep Stereo Matching to Novel Domains
Supplementary Material for Zoom and Learn: Generalizing Deep Stereo Matching to Novel Domains Jiahao Pang 1 Wenxiu Sun 1 Chengxi Yang 1 Jimmy Ren 1 Ruichao Xiao 1 Jin Zeng 1 Liang Lin 1,2 1 SenseTime Research
More information[Supplementary Material] Improving Occlusion and Hard Negative Handling for Single-Stage Pedestrian Detectors
[Supplementary Material] Improving Occlusion and Hard Negative Handling for Single-Stage Pedestrian Detectors Junhyug Noh Soochan Lee Beomsu Kim Gunhee Kim Department of Computer Science and Engineering
More informationEfficient indexing for Query By String text retrieval
Efficient indexing for Query By String text retrieval Suman K. Ghosh Lluís, Gómez, Dimosthenis Karatzas and Ernest Valveny Computer Vision Center, Dept. Ciències de la Computació Universitat Autònoma de
More informationExtend the shallow part of Single Shot MultiBox Detector via Convolutional Neural Network
Extend the shallow part of Single Shot MultiBox Detector via Convolutional Neural Network Liwen Zheng, Canmiao Fu, Yong Zhao * School of Electronic and Computer Engineering, Shenzhen Graduate School of
More informationOTCYMIST: Otsu-Canny Minimal Spanning Tree for Born-Digital Images
OTCYMIST: Otsu-Canny Minimal Spanning Tree for Born-Digital Images Deepak Kumar and A G Ramakrishnan Medical Intelligence and Language Engineering Laboratory Department of Electrical Engineering, Indian
More informationDeconvolutions in Convolutional Neural Networks
Overview Deconvolutions in Convolutional Neural Networks Bohyung Han bhhan@postech.ac.kr Computer Vision Lab. Convolutional Neural Networks (CNNs) Deconvolutions in CNNs Applications Network visualization
More informationarxiv: v1 [cs.cv] 6 Dec 2017
Detecting Curve Text in the Wild: New Dataset and New Solution Yuliang Liu, Lianwen Jin, Shuaitao Zhang, Sheng Zhang College of Electronic Information Engineering South China University of Technology liu.yuliang@mail.scut.edu.cn;
More informationCS231N Project Final Report - Fast Mixed Style Transfer
CS231N Project Final Report - Fast Mixed Style Transfer Xueyuan Mei Stanford University Computer Science xmei9@stanford.edu Fabian Chan Stanford University Computer Science fabianc@stanford.edu Tianchang
More information