Comparing Tesseract results with and without Character localization for Smartphone application

Comparing Tesseract results with and without Character localization for Smartphone application Snehal Charjan 1, Prof. R. V. Mante 2, Dr. P. N. Chatur 3 M.Tech 2 nd year 1, Asst. Professor 2, Head of Department 3 snehalcharjan@gmail.com, mante.ravi@gmail.com,chatur.prashant@gcoea.ac.in Department of Computer Science and Engineering, Government College of Engineering, Amravati (Maharashtra), India. Abstract :- Tesseract is considered the most accurate free OCR engine in existence. Android operating system based Smartphones application where images taken from camera of mobile device or browsed from gallery are preprocessed. The text of these images will be accurately localized within the device using special localization method. Localized text sub-image will be fed for text extraction to the best OCR engine called Tesseract. In this paper, Tesseract results with and without Character localization is compared based on computation time in milliseconds. Each image is taken 10 times and time for each is calculated. The computation time is taken as average of this 10 values. There is drastic change in time and accuracy of localized image compared to nonlocalized image. Finally we concluded the importance of localization in OCR system especially for Smartphone application where we OCR a few words and need high accuracy. scanned images of handwritten, typewritten or printed text into machine-encoded text. A lot of OCR software have been developed to accomplish this mission. Tesseract is used in this method which is one of the most accurate open source OCR engine currently available. [2] After lying dormant for more than 10 years, Tesseract is now behind the leading commercial engines in terms of its accuracy. Its key strength is probably its unusual choice of features. Its key weakness is probably its use of a polygonal approximation as input to the classifier instead of the raw outlines. [3] Many objects in natural images, such as tree branches or electrical wires, are easily confused for text by existing optical character recognition (OCR) algorithms. For this reason, applying OCR on an unprocessed natural image is computationally expensive and may produce erroneous results. Hence, robust and efficient methods are needed to identify the text-containing regions within natural images before performing OCR [1]. Keywords: Pre-process, localization, OCR, android, Smartphone Introduction An Android-platform based image text search application is developed that is able to recognize the text captured by a mobile phone camera and display the web search result onto the screen of the mobile phone. User can get current information about the product, place or boards. The only assumption we make regarding the text is that it is written on a more or less uniform background using a contrasting color. [1] OCR stands for Optical Character Recognition, which is the process of taking an image, and being able to interpret the image and obtain textual data from it. OCR, Optical Character Recognition, is developed to translate I. Methodology The handheld device equipped with a 3-megapixel camera, CPU 330MHz and RAM 128 MB capable of acquiring images of text to be readable by a human viewer. Images taken from camera of mobile device will be browsed from gallery and pre-processed. The text of these images will be accurately localized within the device in a fraction of a second. Localized text sub-image will be fed for text extraction. The text output is then fed to web to search for web search results. The result will be displayed on browser of mobile. In order to identify signs, we begin by locating seed points in areas with homogeneous luminance. In order www.ijrcct.org Page 298

do this; the image is first divided into a grid of size non-overlapping blocks. Each pixel block is tested for homogeneity to determine if it is part of the background of a sign. II. Proposed Scheme Fig.1. Graphical representation of the M=3 weight matrices used to quantify homogeneity for each K K block in the image. The colors white and black are used to represent +1 and -1, respectively, in each K K region. where K is a power of two. The homogeneity of each block is calculated, and blocks which meet a given threshold are labeled as homogenous. The set of pixels from these homogenous blocks are then used as seed points. We determine the homogeneity of a block as follows. For each block, let be the vector of dimension containing the luminance values for the block. We compute homogeneity features for each block given by δ (m) = 2/K 2 iw i (m) Where w m for 0 m<m is a weight vector with binary entries (i.e., each entry is ±1) that sum to zero. With this scaling, δ (m) falls into the same range as the pixels. Since the maximum intensity difference in an individual pixel is 255, the average difference in pixel intensity, δ (m),is then a value in the range of 0 to 255. We use M=3 features corresponding to the three binary weight vectors shown in Fig. 1. Notice that each of the three features quantify the variation of pixels values within a block, with a smaller value of δ (m) signifying a more homogenous texture. After all values have been computed, we classify a block as homogenous if L norm of the vector δ = (δ 0,, δ M-1 ) T is less than a chosen threshold, T u, and at least one of its four neighboring blocks also meets the same condition. Seed points are exactly the set of all pixels in the homogenous blocks. Smaller block sizes have the advantage of identifying smaller homogeneous regions, while larger blocks are less susceptible to noise. Notice that homogeneous blocks are generally not located in areas where edges reside. [1] Fig. 2 software architecture Figure 2 is an overview of the software architecture that is divided into boxes that represent portions of code called an Activity. A specific activity communicates through an Intent, which are the lines relating each activity in Fig 2. Inside each activity are functions that operate on each particular activity. Activity and Intent, which are the fundamental components of producing an Android application are shown. The Home Activity is the first screen in the application and the user can choose to acquire images through the file system in the Gallery activity on the phone or through the camera Preview activity. The Gallery activity is built into the Operating System and only required coding of the intent to retrieve image files. The Preview activity contains code to preview images through the camera before the Capture intent is sent upon pressing the image capture button. Upon Capture or Open each sends a specific intent to the Localize and OCR function at Home activity where the image processing occurs and editable text is displayed on screen of Home activity. III. Results Analysis The analysis of the computation time of computed taking each image 10 times and average value is taken shown in table below. Table 1 show Non-localized image results while Table 2 shows localizes images result by Tesseract. www.ijrcct.org Page 299

No. Original Text Recognized text without Localization Time(in ms) 10 Panasonic VNe x 2141 1 F l ff lf vf l V Osborne Garages I J nf N V s I 3 H f I axis m 14785 11 f f k i 3674 2 L d v1 M V 2823 12 qinl s 1685 3 A 2 f Z d y a m n Ffa fi f m HI xffeu A w x V 15515 13 W 1r1fff WfliSfgith 3769 4 i 55 2375 14 I gf g AI U p for 450 yds 9 7 4469 5 C rw M v P s v House Seatrae Communic I g istered Q 6143 Table 1. Non-localized Results 6 F 2107 No. Localized Text Recognized text with Localization Time(in ms) 7 RE ERVED ibn WfcLuB SYECRETARY 6407 1 Osborne Garages 1154 8 EsPAISloL INGNLES IncLEs ESPANOL 4561 9 Imighrnne 3079 2 London Chelmsford 974 3 g Colchester 906 www.ijrcct.org Page 300

4 Central Eff 1576 5 Seatrade House 1188 6 Sal ers 1564 7 RESERVED FOR CLUB SECRETARY 1552 Fig.3 Comparison between Localized and without Localized OCR results 8 EsPASoL INcLEs ESPANOL 4820 9 Nighrnne 1639 10 fanasonic 1373 11 MmDLEBoucHA 2791 12 PosT QFHCE 2316 13 WHSmith 1767 14 F40 fooiway for 450 yds Table 2. Localized Results 2062 The Comparison between Localized and without Localized OCR results for 14 images is shown in graph below where red graph line is time for Non-localized image processing and blue graph line is Localized image processing time by Tesseract. It clearly shows that Non-Localized image take more than double time in processing by Tesseract, also accuracy is very low.only 5 out of 14 Non-localized image output are nearly correct which also need post-processing for correction. So accuracy is 35.71 %. While localized image gives fast and accurate output which is most important for sending internet query to get right search results. Almost all localized images give almost accurate results and 7 out of 14 localized image output are correct and 5 localized images can give correct output after post-processing for correction. So accuracy is 85.71 %. IV. Conclusion The goal of this project is to focus on Time complexity and accuracy comparison between localized and non-localized image processing by Tesseract. Non-Localized image take more than double time in processing by Tesseract, which makes Smartphone application slower. Also accuracy is very low which is most important for using that text for next function or application, for example, in our proposed scheme feeding text to search engine to get search results, we need accurate search query to be provided to get right results. Without localization accuracy is 35.71 %. While localized images gives 85.71 %. accuracy by Tesseract. So for Smartphone application where accuracy and speed is most important localization is very important and must. Future scope is to decrease the localization speed and make attempt in post-processing to get more accurate results. www.ijrcct.org Page 301

Reference [1] Katherine L. Bouman, Golnaz Abdollahian, A Low Complexity Sign Detection and Text Localization Method for Mobile Applications, IEEE Transactions on multimedia, VOL.13, NO. 5, OCTOBER 2011. Computer Science and Engineering department at Govt. College of Engineering, Amravati. [2] Derek Ma, Qiuhau Lin, Tong Zhang Mobile Camera Based Text Detection and Translation Stanford University, Nov 2000. [3] Ray Smith, An Overview of the Tesseract OCR Engine, Google Inc. IEEE 0-7695-2822-8/07,2007 Snehal Charjan received her B.E. degree in Computer science and Engineering, from G.H.Raisoni College of Engineering, Nagpur, Maharashtra, India, in 2011, pursuing M.tech Degree in Computer Science and Engineering from Government college of Engineering, Amravati, s Photo Maharashtra, India. Her research interests include Pattern recognition, Optical Character recognition. At present, she is engaged in Character localization and recognition application for Smartphone. Ravi V. Mante received his B.E. degree in Computer science and Engineering, from Government college of Engineering, Amravati, Maharashtra, India in 2006, the M.tech Degree in Computer science and Engineering, from Government college of Engineering, Amravati, Maharashtra, India, in 2011. He is Assistant professor in Government college of Engineering, Amravati, Maharashtra, India since 2007. His research interests include ECG signal analysis, soft computing technique, cloud computing. At present, He is working with Artificial neural network. P. N. Chatur has received his M.E. degree in Electronics Engineering from Govt. College of Engineering, Amravati, India and PhD degree from Amravati University. He has published twenty papers in national and ten papers in international journal. His area of research includes Neural Network, data mining. Currently he is head of www.ijrcct.org Page 302