Improved Optical Recognition of Bangla Characters

Size: px
Start display at page:

Download "Improved Optical Recognition of Bangla Characters"

Transcription

1 SUST Studies, Vol. 12, No. 1, 2010; P:69-78 Improved Optical Recognition of Bangla Characters Md. Mahbubul Haque, A.Q. M. Saiful Islam, Md. Mahadi Hasan and M. Shahidur Rahman Department of Computer Science & Engineering, Shahjalal University of Science & Technology Sylhet-3114, Bangladesh. Abstract This paper presents a method on improved optical recognition of Bangla characters toward building a working system. The recognition process performs pre-processing of input image, segmentation of both lines, words and characters and then recognizes the characters based on the classifying criteria. Experience with OCR (Optical Character Recognition) problem teaches that for most subtasks involve in OCR, there is no single technique that gives perfect results for all the variations of printed text. Based on the features of Bangla characters, we have proposed a number of classifying features for generating an 11-digit character code which is then used to identify a particular character. Additionally, the character code is suitable of recognizing the conjuncts without breaking into their constituents. Experimental results show that the proposed algorithm can provide recognition accuracy up to 99% for regular-shaped distinct basic characters and conjuncts. This shows us a hope toward building a complete working system in Bangla. 1. Introduction An OCR can be regarded as one of the most useful tool in the process of digitization of literature. It is, however, a challenging task to build a practical OCR [1, 2]. There are several problems encountered in processing a document. The text zone of the document needs to be extracted before the recognition of the text can take place. In printed text, skew results in overlap between the text lines that makes it difficult to split the lines. The background noise can further complicate the situation. The ink spread results in character fusions and fading can fragment a character. In OCR, there is a conflicting demand of classifying a large set of natural variants into a single class and at the same time discriminate between closely resembling patterns. The last 50 years of research has clearly demonstrated that no single strategy is sufficient for dealing with the complexity of the problem. Moreover, the strategy cannot be same for reading texts of different scripts/languages. Bangla script is alphabetic in nature and the words are two dimensional compositions of characters and symbols which makes it different from Roman and ideographic scripts. The algorithms which perform well for other scripts can be applied only after extensive pre-processing which makes simple adaptation ineffective. An optical character recognition system usually consists of two main stages, namely segmentation and recognition. There is a possibility of ambiguity at each stage. The line segmentation may be ambiguous due to tilt and overlapping of text lines. Some of the closely written words may not get segmented into individual words. The characters may be fused due to unwanted breaks, resulting in wrong segmentation. Very often, adjacent characters are touching, and may exist in an overlapped field. Therefore, it is a complex task to segment a given word correctly into its character components. B. B. Chaudhuri and U. Pal have done a great amount of tasks in the area of Bangla OCR as well as segmentation [3-6]. Alamgir have reported various aspects of Bangla script recognition in [7]. The post-

2 70 Md. Mahbubul Haque, A.Q. M. Saiful Islam, Md. Mahadi Hasan and M. Shahidur Rahman processing system is based on contextual knowledge which checks the composition syntax. Khademul [8] have described Bangla numeral recognition based on the structural approach. Neural network approach for isolated characters has also been reported [9-11]. One of the recent works in character segmentation is found in [12] where the solution concerns about the reformation of the characters of upper and middle zone. Hasnat et. al. at [13] discussed the problems and provide a training dependent solution to reduce the segmentation error. Hasnat and Khan dealt with eliminating splitting errors in [14] where they applied rule-based feature information in addition with some classifying features. Since the basic characters (vowels and consonants) constitute 94%-96% of the text, most the researches emphasize on recognizing the basic characters. In addition with the basic characters, in this paper, we have proposed new classifying features for identifying conjuncts. Experimental results show satisfactory accuracy for distinct basic characters and conjuncts. Currently, research is going on to attain acceptable result up to sentence and paragraph-level so that a true working system can be built on Bangla OCR. 2. Composition of Bangla script from OCR viewpoint Bangla script can be summarized as comprising 49 basic characters, 14 modifiers called kar, fola, ref and hashanta (placed to the left, right, above, or at the bottom of a character or conjunct), more than conjuncts, 10 numerals and 25 symbols and punctuation marks [15]. Some of the modifiers are placed before or next to the consonant (core modifiers), some are above (top modifiers) and some are placed below (lower modifiers) a consonant. In Bangla, a consonant has a pure form as well as a shortened form. Bangla script contains a pure form for all the consonants and shortened form of many of the consonants. A consonant in shortened form always touches the next character, yielding conjuncts. Other than the numeric symbols, there are some punctuation marks in Bangla script which are used to provide actual meaning to the text. A horizontal line is drawn on top of all characters of a word that is referred to as header line. It is convenient to visualize a Bangla word in terms of three strips: a core strip, a top strip and a bottom strip. The core and top strips are separated by the header line. The top strip contains the top modifiers, the bottom strip contains lower modifiers and the core strip contains the characters and core modifiers. If no consonant of a word has top modifier, the top strip will be empty. Similarly, if no character of a word has lower modifier, the bottom strip will be empty. It is possible that either of the bottom or top strips, or both the strips are present. 3. Recognition Methodology A script can contain images, graphs, tables etc, in addition to the text. Extraction of text-zones from the document has been extensively studied [10, 11, 12] and still continues to be an active research area. In this paper, we employ a pre-processing stage that extracts uniform text zones from the document image. The current system segments each uniform text zone into text lines and text lines into words. Words are further segmented into characters and symbols. The characters and symbols may not be valid Bangla symbols when viewed in isolation. We refer to characters and symbols as recognition unit. Extraction of unit from preprocessed document Image can be performed by the steps outlined in Fig. 1. A. Line Segmentation Bangla script is written from left to right and top to bottom. A text line is separated from the previous and following text lines by white space. This segmentation is based on horizontal histograms of the document. A horizontal histogram of the uniform text zone is produced. A zero value in the histogram corresponds to a horizontal gap. The horizontal gaps are assumed to be the line boundaries.

3 Improved Optical Recognition of Bangla Characters 71 B. Identify Header-Line Position Position of the header-line is a dominating feature for extracting characters and modifiers correctly. Headerline is a long vertical stroke started from left and spans to the right from start to end. As this is the common line, the horizontal projection shows it as an instantaneous stroke at the plot. The stroke becomes very high rapidly and falls down rapidly again having a width of only 1 or 2 pixels. The system checks if there is a rapid variation in pixel count while checking the rows one after another. If it is found the position is stored. Scan the document Image Preprocessing: Text area recognition Segmentation: => Line segmentation Star of line End of line => Word segmentation Detection of Top strip Detection of Core strip Bottom script of word detection => Header line detection => Character Segmentation Preliminary Character Segmentation Defined Character Bound Shadow Character Segmentation Upper and Lower Modifier Segmentation => Character Recognition Categorize Character Does Header Exist? Bar Position? 1. Pre bar? 2. Mid bar? 3. End bar? 4. No bar? Is Upper Modifier? Is Connected with Header Line? Generate Unique Character Code Character Code = Criteria code + Character Sequence (Horizontal & Vertical) Identify the Character according to the Character Code Construction of Words and Sentences Fig. 1: The outline of the proposed method C. Segmentation of a text line into Words In Bangla script, characters and symbols of a word are joined together by a header line. As a result, word boundaries are rarely ambiguous. The gap in header line creates no problem for word boundary identification. A vertical histogram of a text line is made and every gap of two or more pixels in the histogram is taken to be the word delimiter.

4 72 Md. Mahbubul Haque, A.Q. M. Saiful Islam, Md. Mahadi Hasan and M. Shahidur Rahman D. Segmentation of a word into symbols and characters The header line joins the characters of a word together which makes the segmentation of a word into its constituent characters slightly difficult. The region above the header line contains upper modifier symbols. The region below the header line contains core characters and lower modifier symbols. Before segmentation can progress any further, header line must be identified. Header line is easily identified as it is the most dominating horizontal line in a word. After the header line is removed, vertical gap separates the top modifiers from the neighbors. The characters below header line are also separated from their neighbors by vertical gaps. Step 1 Step 2 Step 3 Vertical histogram Step 4 Step 5 Step 6 Fig. 2: Character segmentation from text line E. Segmentation of Modifiers Modifiers suitable for using at left and right position of a core character can be easily segmented from the vertical projection profile (e.g. evsjv `k in Fig. 2). However, decomposing upper and lower modifier is relatively tedious. Horizontal and vertical projections of upper modifiers are taken starting from (0,0) to (char width, header- line position). If an upper modifier joins another left or right modifier in the header line, then concatenation of both is considered a single modifier (as in the cases of ÕwÕ, x ). Fig. 3 shows an upper modifier, its horizontal and vertical projections and concatenation with another (left or right) modifier. Upper modifier Horizontal Projection Vertical Projection Concatenated Fig. 3: Segmentation process of the upper modifier

5 Improved Optical Recognition of Bangla Characters 73 Lower modifiers may have a thick joining, weak joining or no joining at all. If the lower modifier has no joining with the core character, it can be readily separated by an outer boundary traversal (Fig. 4a). But if they have a thick or weak joining, a threshold of the height of the character is determined first and then the point of lowest density investigated below the threshold height. This position is treated as the segmentation point where the core character is separated from the modifier. Fig. 4b shows the case. Gap Segmented Thick Segmentation Point Segmented (a) No joining (b) Thick joining Fig. 4: Segmentation process of the lower modifier 4. Recognition of Characters A. After the word is segmented, the recognition algorithm is applied to characters. The salient features employed to the recognition process are summarized below. i. Existence of the header line In Bangla script, some characters have complete header line whereas some doesn t have. This can be the first criterion to categorize the characters which make the recognition process easier and faster. Fig. 5: Recognition of the header line ii. Position of the Vertical Bar Bangla characters often exhibit vertical bar starting from header line and spans up to the whole core strip height. Those characters can be classified as Pre Bar characters Mid Bar characters End Bar characters, and Non Bar characters Fig. 6: Recognition of the placement of the vertical bars

6 74 Md. Mahbubul Haque, A.Q. M. Saiful Islam, Md. Mahadi Hasan and M. Shahidur Rahman iii. Top strip Some characters have portion at the top strip, which is absent in some other characters. Fig. 7: Recognition of the existence of the top strip iv. Connection to the Header Line Sometimes the header line is connected with the rest part of the characters whereas some other characters do not touch the header line such as in Z Ó and ÓfÓ. Fig. 8: Verify if a character touches the header line v. Dot as a lower modifier Recognition of the presence of the dots at the lower part of the character further helps classify the characters. Fig. 9: Recognition of the existence of the dot at the lower part of the character In addition with the features stated above, some other classifying features can be used, for example, extra connections with he header line (m, b), connection place with end-bar (b, m) and number of connections with end-bar (b, l), for improved accuracy. B. Horizontal Zero Crossings Image of a character is treated as an array of pixels. A black pixel is expressed by a 1 and a white pixel by 0. After tracing the whole array row by row, number of transitions from black pixel to white pixel is recorded for each row. Let N i be the number of transitions for ith row. The sequence N i, 0<= i<= n, where n is the pixel height of the character, is referred to as horizontal zero crossing sequence S.

7 Improved Optical Recognition of Bangla Characters 75 Fig. 10: The process of horizontal zero-crossings A character is divided into n horizontal segments of equal height. Each segment is represented by the number of zero-transitions that is most frequent in the segment. We divide the sequence S into three sub-sequences of equal length, S1,S2, and S3. For each subsequence the most frequent number is stored in S1-Most, S2-Most, S3-Most. The feature vector consists of (S1- Most, S2 Most, S3- Most) which is then used to filter out the false character candidates. The following table shows the obtained sequence, which are same for all the 3 fontsize of ÕKÕ and L' which are 131 and 122, respectively. Similar results have been obtained using four different font faces. Table 1: Sequence S for four samples of character ÕKÕ and L' Sequence S for ÕKÕ Sequence s for Õ L Õ => => => => => => => => => => => => 122 C. Vertical Zero Crossings Vertical zero crossing is as same as the horizontal crossing where the image block is divided into several vertical segments. The process of vertical zero crossing yields another 3-digit code. Fig. 11: The process of vertical zero-crossings D. Recognition from the Generated Code sequence The processes in Sections A, B, and C produce 11-digit (5 digits of category code, 3 digits of horizontal, and 3 digits from vertical zero crossings) code that is employed for identifying a character.

8 76 Md. Mahbubul Haque, A.Q. M. Saiful Islam, Md. Mahadi Hasan and M. Shahidur Rahman Criteria Code (5 Digit) Header 1 = Yes 0 = No Bar 1=Pre bar 2=Mid bar 3=End bar 4=No bar Table 2: Generating character code using the classifying features Upper Modifier 1= Yes 0= No Connection with header 1 = Yes 0 = No Existence of dot 1 = Yes 0 = No Horizontal code (3 digit) Vertical code (3 digit) Unicode K' 004b L 004c The obtained character code can vary according to font face and size. In the next section we presented experimental results for font-face SutonnyMJ. 5. Recognition Results Four snapshots from the program output are shown in Fig. 5. As seen in the figure, four type of input (same size basic characters, different size basic characters, same size conjuncts, different size conjuncts) have been used to verify the performance of the proposed method. The font size varies from 14 to 32-point. Experimental setup and the obtained recognition accuracy are summarized in Table 3. Table 3: Recognition accuracy of the proposed method Font size No of Characters (basic) Recognition accuracy (%) No of Characters (conjuncts) Conclusion Recognition accuracy (%) In this paper, we have presented a recognition method of Bangla characters. The results presented here using frequently used basic characters and conjuncts are satisfactory. The accuracy, however, decreases when recognizing from a complete sentence. Research is going on to improve the accuracy rate on characters segmented from sentences. Primarily, we concern only the regular-shaped font of varying sizes. By incorporating a neural network based post processor, we aim to build a complete working system on Bangla OCR addressing more variations in printed Bangla text.

9 Improved Optical Recognition of Bangla Characters 77 References [1] S. Kahan, T. Pavlidis, and H.S. Baird, On the recognition of printed characters of any font and size, IEEE Trans. Pattern Analysis and Machine Intelligence, Vol. 9(2), pp , [2] M-K. Kim, Y-B. Kwon, "Multi-font and Multi-Size Character Recognition Based on the Sampling and Quantization of an Unwrapped Contour," Proc. of Intl. Conf. on Pattern Recognition, vol. 3, pp.170, Vienna, Aug 25-29, [3] U. Pal, B. B. Chaudhuri, "OCR in Bangla: an Indo-Bangladeshi Language", Proc. of ICPR, pp , Jerusalem, Israel, [4] B. B. Chaudhuri, U. Pal, "An OCR System to Read Two Indian Language Scripts: Bangla and Devnagari(Hindi)", Proc. of 4th ICDAR, vol.2, pp , Ulm, Germany, [5] U. Garain and B. B. Chaudhuri, Segmentation of Touching Characters in Printed Devnagari and Bangla Scripts Using Fuzzy Multifactorial Analysis, IEEE Transactions on Systems, Man and Cybernetics, vol. 32, pp , Nov., [6] B. B. Chaudhuri, "Digital document processing: major directions and recent advances", Springer, London, [7] Mohammed Alamgir, Md. Khademul Islam Molla, and Muhammed Zafar Iqbal, Bangla Character Recognition System, Proc. of ICCIT 99, pp , 3-5 December [8] Md. Khademul Islam Molla and Kamrul Hasan Talukder, Bangla Number Extraction and Recognition from Document Image, Proc. of ICCIT, pp , Dhaka, [9] A. A. Chowdhury, Ejaj Ahmed, S. Ahmed, S. Hossain and C. M. Rahman, "Optical Character Recognition of Bangla Characters using neural network: A better approach". Proc. of 2nd ICEE, [10] J. U. Mahmud, M. F. Raihan and C. M. Rahman, "A Complete OCR System for Continuous Bangla Characters", Proc. of the Conf. on Convergent Technologies, [11] S. M. Shoeb Shatil and Mumit Khan, Minimally Segmenting High Performance Bangla OCR using Kohonen Network, Proc. of 9th ICCIT, [12] Md. Abdus Sattar, Khaled Mahmud, Humayun Arafat and A F M Noor Uz Zaman, "Segmenting Bangla Text for Optical Recognition, Proc. of ICCIT, [13] Md. Abul Hasnat, S M Murtoza Habib and Mumit Khan, "A High Performance Domain Specific OCR for Bangla Script", Proc. of Int. Joint Conf. on Computer, Information, and Systems Sciences, and Engineering (CISSE), [14] Md. Abul Hasnat and Mumit Khan, Elimination of Splitting Errors in Printed Bangla Scripts Proc. of the Conference on Language & Technology 2009, January, 2009 [15] A. Elahi, Bangla OCR: A Complete Working System, B.Sc. Thesis, 2006, Dept. Computer Science & Engineering, Shahjajlal University of Science & Technology, Sylhet, Bangladesh.

10 78 Md. Mahbubul Haque, A.Q. M. Saiful Islam, Md. Mahadi Hasan and M. Shahidur Rahman Fig. 5: Snapshots form the program output Submitted: 8 th July 2009; Accepted for Publication: 25 th January, 2010.

Optical Character Recognition (OCR) for Printed Devnagari Script Using Artificial Neural Network

Optical Character Recognition (OCR) for Printed Devnagari Script Using Artificial Neural Network International Journal of Computer Science & Communication Vol. 1, No. 1, January-June 2010, pp. 91-95 Optical Character Recognition (OCR) for Printed Devnagari Script Using Artificial Neural Network Raghuraj

More information

Minimally Segmenting High Performance Bangla Optical Character Recognition Using Kohonen Network

Minimally Segmenting High Performance Bangla Optical Character Recognition Using Kohonen Network Minimally Segmenting High Performance Bangla Optical Character Recognition Using Kohonen Network Adnan Mohammad Shoeb Shatil and Mumit Khan Computer Science and Engineering, BRAC University, Dhaka, Bangladesh

More information

Isolated Handwritten Words Segmentation Techniques in Gurmukhi Script

Isolated Handwritten Words Segmentation Techniques in Gurmukhi Script Isolated Handwritten Words Segmentation Techniques in Gurmukhi Script Galaxy Bansal Dharamveer Sharma ABSTRACT Segmentation of handwritten words is a challenging task primarily because of structural features

More information

A New Technique for Segmentation of Handwritten Numerical Strings of Bangla Language

A New Technique for Segmentation of Handwritten Numerical Strings of Bangla Language I.J. Information Technology and Computer Science, 2013, 05, 38-43 Published Online April 2013 in MECS (http://www.mecs-press.org/) DOI: 10.5815/ijitcs.2013.05.05 A New Technique for Segmentation of Handwritten

More information

Segmentation free Bangla OCR using HMM: Training and Recognition

Segmentation free Bangla OCR using HMM: Training and Recognition Segmentation free Bangla OCR using HMM: Training and Recognition Md. Abul Hasnat, S.M. Murtoza Habib, Mumit Khan BRAC University, Bangladesh mhasnat@gmail.com, murtoza@gmail.com, mumit@bracuniversity.ac.bd

More information

Optical Character Recognition For Bangla Documents Using HMM

Optical Character Recognition For Bangla Documents Using HMM Optical Character Recognition For Bangla Documents Using HMM Md. Sheemam Monjel and Mumit Khan Dept. of CSE, BRAC University, Dhaka, Bangladesh. sheemam@bracuniversity.net, mumit@bracuniversity.net Abstract

More information

Skew Angle Detection of Bangla Script using Radon Transform

Skew Angle Detection of Bangla Script using Radon Transform Skew Angle Detection of Bangla Script using Radon Transform S. M. Murtoza Habib, Nawsher Ahamed Noor and Mumit Khan Center for Research on Bangla Language Processing, BRAC University, Dhaka, Bangladesh.

More information

A Survey of Problems of Overlapped Handwritten Characters in Recognition process for Gurmukhi Script

A Survey of Problems of Overlapped Handwritten Characters in Recognition process for Gurmukhi Script A Survey of Problems of Overlapped Handwritten Characters in Recognition process for Gurmukhi Script Arwinder Kaur 1, Ashok Kumar Bathla 2 1 M. Tech. Student, CE Dept., 2 Assistant Professor, CE Dept.,

More information

FRAGMENTATION OF HANDWRITTEN TOUCHING CHARACTERS IN DEVANAGARI SCRIPT

FRAGMENTATION OF HANDWRITTEN TOUCHING CHARACTERS IN DEVANAGARI SCRIPT International Journal of Information Technology, Modeling and Computing (IJITMC) Vol. 2, No. 1, February 2014 FRAGMENTATION OF HANDWRITTEN TOUCHING CHARACTERS IN DEVANAGARI SCRIPT Shuchi Kapoor 1 and Vivek

More information

Segmentation of Bangla Handwritten Text

Segmentation of Bangla Handwritten Text Thesis Report Segmentation of Bangla Handwritten Text Submitted By: Sabbir Sadik ID:09301027 Md. Numan Sarwar ID: 09201027 CSE Department BRAC University Supervisor: Professor Dr. Mumit Khan Date: 13 th

More information

An Efficient Character Segmentation Based on VNP Algorithm

An Efficient Character Segmentation Based on VNP Algorithm Research Journal of Applied Sciences, Engineering and Technology 4(24): 5438-5442, 2012 ISSN: 2040-7467 Maxwell Scientific organization, 2012 Submitted: March 18, 2012 Accepted: April 14, 2012 Published:

More information

Optical Character Recognition of Bangla Characters using neural network: A better approach

Optical Character Recognition of Bangla Characters using neural network: A better approach Optical Character Recognition of Bangla Characters using neural network: A better approach Ahmed Asif Chowdhury, Ejaj Ahmed, Shameem Ahmed, Shohrab Hossain, Chowdhury Mofizur Rahman Department of Computer

More information

Segmentation Based Optical Character Recognition for Handwritten Marathi characters

Segmentation Based Optical Character Recognition for Handwritten Marathi characters Segmentation Based Optical Character Recognition for Handwritten Marathi characters Madhav Vaidya 1, Yashwant Joshi 2,Milind Bhalerao 3 Department of Information Technology 1 Department of Electronics

More information

A System for Joining and Recognition of Broken Bangla Numerals for Indian Postal Automation

A System for Joining and Recognition of Broken Bangla Numerals for Indian Postal Automation A System for Joining and Recognition of Broken Bangla Numerals for Indian Postal Automation K. Roy, U. Pal and B. B. Chaudhuri CVPR Unit; Indian Statistical Institute, Kolkata-108; India umapada@isical.ac.in

More information

OCR For Handwritten Marathi Script

OCR For Handwritten Marathi Script International Journal of Scientific & Engineering Research Volume 3, Issue 8, August-2012 1 OCR For Handwritten Marathi Script Mrs.Vinaya. S. Tapkir 1, Mrs.Sushma.D.Shelke 2 1 Maharashtra Academy Of Engineering,

More information

Segmentation free Bangla OCR using HMM: Training and Recognition

Segmentation free Bangla OCR using HMM: Training and Recognition Segmentation free Bangla OCR using HMM: Training and Recognition Md. Abul Hasnat BRAC University, Bangladesh mhasnat@gmail.com S. M. Murtoza Habib BRAC University, Bangladesh murtoza@gmail.com Mumit Khan

More information

Recognition of Unconstrained Malayalam Handwritten Numeral

Recognition of Unconstrained Malayalam Handwritten Numeral Recognition of Unconstrained Malayalam Handwritten Numeral U. Pal, S. Kundu, Y. Ali, H. Islam and N. Tripathy C VPR Unit, Indian Statistical Institute, Kolkata-108, India Email: umapada@isical.ac.in Abstract

More information

Research Report on Bangla OCR Training and Testing Methods

Research Report on Bangla OCR Training and Testing Methods Research Report on Bangla OCR Training and Testing Methods Md. Abul Hasnat BRAC University, Dhaka, Bangladesh. hasnat@bracu.ac.bd Abstract In this paper we present the training and recognition mechanism

More information

Indian Multi-Script Full Pin-code String Recognition for Postal Automation

Indian Multi-Script Full Pin-code String Recognition for Postal Automation 2009 10th International Conference on Document Analysis and Recognition Indian Multi-Script Full Pin-code String Recognition for Postal Automation U. Pal 1, R. K. Roy 1, K. Roy 2 and F. Kimura 3 1 Computer

More information

BANGLA OPTICAL CHARACTER RECOGNITION. A Thesis. Submitted to the Department of Computer Science and Engineering. BRAC University. S. M.

BANGLA OPTICAL CHARACTER RECOGNITION. A Thesis. Submitted to the Department of Computer Science and Engineering. BRAC University. S. M. BANGLA OPTICAL CHARACTER RECOGNITION A Thesis Submitted to the Department of Computer Science and Engineering of BRAC University by S. M. Murtoza Habib Student ID: 01201071 In Partial Fulfillment of the

More information

Building Multi Script OCR for Brahmi Scripts: Selection of Efficient Features

Building Multi Script OCR for Brahmi Scripts: Selection of Efficient Features Building Multi Script OCR for Brahmi Scripts: Selection of Efficient Features Md. Abul Hasnat Center for Research on Bangla Language Processing (CRBLP) Center for Research on Bangla Language Processing

More information

Fine Classification of Unconstrained Handwritten Persian/Arabic Numerals by Removing Confusion amongst Similar Classes

Fine Classification of Unconstrained Handwritten Persian/Arabic Numerals by Removing Confusion amongst Similar Classes 2009 10th International Conference on Document Analysis and Recognition Fine Classification of Unconstrained Handwritten Persian/Arabic Numerals by Removing Confusion amongst Similar Classes Alireza Alaei

More information

INTERNATIONAL JOURNAL OF PURE AND APPLIED RESEARCH IN ENGINEERING AND TECHNOLOGY

INTERNATIONAL JOURNAL OF PURE AND APPLIED RESEARCH IN ENGINEERING AND TECHNOLOGY INTERNATIONAL JOURNAL OF PURE AND APPLIED RESEARCH IN ENGINEERING AND TECHNOLOGY A PATH FOR HORIZING YOUR INNOVATIVE WORK HANDWRITTEN DEVANAGARI CHARACTERS RECOGNITION THROUGH SEGMENTATION AND ARTIFICIAL

More information

A two-stage approach for segmentation of handwritten Bangla word images

A two-stage approach for segmentation of handwritten Bangla word images A two-stage approach for segmentation of handwritten Bangla word images Ram Sarkar, Nibaran Das, Subhadip Basu, Mahantapas Kundu, Mita Nasipuri #, Dipak Kumar Basu Computer Science & Engineering Department,

More information

A FEATURE BASED CHAIN CODE METHOD FOR IDENTIFYING PRINTED BENGALI CHARACTERS

A FEATURE BASED CHAIN CODE METHOD FOR IDENTIFYING PRINTED BENGALI CHARACTERS A FEATURE BASED CHAIN CODE METHOD FOR IDENTIFYING PRINTED BENGALI CHARACTERS Ankita Sikdar 1, Payal Roy 1, Somdeep Mukherjee 1, Moumita Das 1 and Sreeparna Banerjee 2 1 Department of Computer Science and

More information

Bangla Character Recognition System is Developed by using Automatic Feature Extraction and XOR Operation

Bangla Character Recognition System is Developed by using Automatic Feature Extraction and XOR Operation Global Journal of Computer Science and Technology Graphics & Vision Volume 13 Issue 2 Version 1.0 Year 2013 Type: Double Blind Peer Reviewed International Research Journal Publisher: Global Journals Inc.

More information

A Segmentation Free Approach to Arabic and Urdu OCR

A Segmentation Free Approach to Arabic and Urdu OCR A Segmentation Free Approach to Arabic and Urdu OCR Nazly Sabbour 1 and Faisal Shafait 2 1 Department of Computer Science, German University in Cairo (GUC), Cairo, Egypt; 2 German Research Center for Artificial

More information

Morphological Approach for Segmentation of Scanned Handwritten Devnagari Text

Morphological Approach for Segmentation of Scanned Handwritten Devnagari Text Abstract In this paper we present a system towards the of Hindi Handwritten Devnagari Text. Segmentation of script is essential for handwritten script recognition. This system deals with of (matras) and

More information

Word-wise Hand-written Script Separation for Indian Postal automation

Word-wise Hand-written Script Separation for Indian Postal automation Word-wise Hand-written Script Separation for Indian Postal automation K. Roy U. Pal Dept. of Comp. Sc. & Engg. West Bengal University of Technology, Sector 1, Saltlake City, Kolkata-64, India Abstract

More information

Enhancing the Character Segmentation Accuracy of Bangla OCR using BPNN

Enhancing the Character Segmentation Accuracy of Bangla OCR using BPNN Enhancing the Character Segmentation Accuracy of Bangla OCR using BPNN Shamim Ahmed 1, Mohammod Abul Kashem 2 1 M.S. Student, Department of Computer Science and Engineering, Dhaka University of Engineering

More information

Layout Segmentation of Scanned Newspaper Documents

Layout Segmentation of Scanned Newspaper Documents , pp-05-10 Layout Segmentation of Scanned Newspaper Documents A.Bandyopadhyay, A. Ganguly and U.Pal CVPR Unit, Indian Statistical Institute 203 B T Road, Kolkata, India. Abstract: Layout segmentation algorithms

More information

Complementary Features Combined in a MLP-based System to Recognize Handwritten Devnagari Character

Complementary Features Combined in a MLP-based System to Recognize Handwritten Devnagari Character Journal of Information Hiding and Multimedia Signal Processing 2011 ISSN 2073-4212 Ubiquitous International Volume 2, Number 1, January 2011 Complementary Features Combined in a MLP-based System to Recognize

More information

Character Recognition

Character Recognition Character Recognition 5.1 INTRODUCTION Recognition is one of the important steps in image processing. There are different methods such as Histogram method, Hough transformation, Neural computing approaches

More information

Research Report on Bangla Optical Character Recognition Using Kohonen etwork

Research Report on Bangla Optical Character Recognition Using Kohonen etwork Research Report on Bangla Optical Character Recognition Using Kohonen etwork Adnan Md. Shoeb Shatil Center for Research on Bangla Language Processing BRAC University, Dhaka, Bangladesh. shoeb_shatil@yahoo.com

More information

SEGMENTATION OF CHARACTERS WITHOUT MODIFIERS FROM A PRINTED BANGLA TEXT

SEGMENTATION OF CHARACTERS WITHOUT MODIFIERS FROM A PRINTED BANGLA TEXT SEGMENTATION OF CHARACTERS WITHOUT MODIFIERS FROM A PRINTED BANGLA TEXT ABSTRACT Rupak Bhattacharyya et al. (Eds) : ACER 2013, pp. 11 24, 2013. CS & IT-CSCP 2013 Fakruddin Ali Ahmed Department of Computer

More information

A Review on Different Character Segmentation Techniques for Handwritten Gurmukhi Scripts

A Review on Different Character Segmentation Techniques for Handwritten Gurmukhi Scripts WWJMRD2017; 3(10): 162-166 www.wwjmrd.com International Journal Peer Reviewed Journal Refereed Journal Indexed Journal UGC Approved Journal Impact Factor MJIF: 4.25 e-issn: 2454-6615 Manas Kaur Research

More information

A Complete Workflow for Development of Bangla OCR

A Complete Workflow for Development of Bangla OCR A Complete Workflow for Development of Bangla OCR Farjana Yeasmin Omee Dept. of Computer Science & Engineering Shahjalal University of Science & Technology, Sylhet Shiam Shabbir Himel Dept. of Computer

More information

Online Bangla Handwriting Recognition System

Online Bangla Handwriting Recognition System 1 Online Bangla Handwriting Recognition System K. Roy Dept. of Comp. Sc. West Bengal University of Technology, BF 142, Saltlake, Kolkata-64, India N. Sharma, T. Pal and U. Pal Computer Vision and Pattern

More information

Khmer OCR for Limon R1 Size 22 Report

Khmer OCR for Limon R1 Size 22 Report PAN Localization Project Project No: Ref. No: PANL10n/KH/Report/phase2/002 Khmer OCR for Limon R1 Size 22 Report 09 July, 2009 Prepared by: Mr. ING LENG IENG Cambodia Country Component PAN Localization

More information

Recognition of online captured, handwritten Tamil words on Android

Recognition of online captured, handwritten Tamil words on Android Recognition of online captured, handwritten Tamil words on Android A G Ramakrishnan and Bhargava Urala K Medical Intelligence and Language Engineering (MILE) Laboratory, Dept. of Electrical Engineering,

More information

An Accurate and Efficient System for Segmenting Machine-Printed Text. Yi Lu, Beverly Haist, Laurel Harmon, John Trenkle and Robert Vogt

An Accurate and Efficient System for Segmenting Machine-Printed Text. Yi Lu, Beverly Haist, Laurel Harmon, John Trenkle and Robert Vogt An Accurate and Efficient System for Segmenting Machine-Printed Text Yi Lu, Beverly Haist, Laurel Harmon, John Trenkle and Robert Vogt Environmental Research Institute of Michigan P. O. Box 134001 Ann

More information

Scanner Parameter Estimation Using Bilevel Scans of Star Charts

Scanner Parameter Estimation Using Bilevel Scans of Star Charts ICDAR, Seattle WA September Scanner Parameter Estimation Using Bilevel Scans of Star Charts Elisa H. Barney Smith Electrical and Computer Engineering Department Boise State University, Boise, Idaho 8375

More information

Comparative Performance Analysis of Feature(S)- Classifier Combination for Devanagari Optical Character Recognition System

Comparative Performance Analysis of Feature(S)- Classifier Combination for Devanagari Optical Character Recognition System Comparative Performance Analysis of Feature(S)- Classifier Combination for Devanagari Optical Character Recognition System Jasbir Singh Department of Computer Science Punjabi University Patiala, India

More information

Handwriting segmentation of unconstrained Oriya text

Handwriting segmentation of unconstrained Oriya text Sādhanā Vol. 31, Part 6, December 2006, pp. 755 769. Printed in India Handwriting segmentation of unconstrained Oriya text N TRIPATHY and U PAL Computer Vision and Pattern Recognition Unit, Indian Statistical

More information

Keyword Spotting in Document Images through Word Shape Coding

Keyword Spotting in Document Images through Word Shape Coding 2009 10th International Conference on Document Analysis and Recognition Keyword Spotting in Document Images through Word Shape Coding Shuyong Bai, Linlin Li and Chew Lim Tan School of Computing, National

More information

Segmentation of Characters of Devanagari Script Documents

Segmentation of Characters of Devanagari Script Documents WWJMRD 2017; 3(11): 253-257 www.wwjmrd.com International Journal Peer Reviewed Journal Refereed Journal Indexed Journal UGC Approved Journal Impact Factor MJIF: 4.25 e-issn: 2454-6615 Manpreet Kaur Research

More information

A Brief Study of Feature Extraction and Classification Methods Used for Character Recognition of Brahmi Northern Indian Scripts

A Brief Study of Feature Extraction and Classification Methods Used for Character Recognition of Brahmi Northern Indian Scripts 25 A Brief Study of Feature Extraction and Classification Methods Used for Character Recognition of Brahmi Northern Indian Scripts Rohit Sachdeva, Asstt. Prof., Computer Science Department, Multani Mal

More information

SEVERAL METHODS OF FEATURE EXTRACTION TO HELP IN OPTICAL CHARACTER RECOGNITION

SEVERAL METHODS OF FEATURE EXTRACTION TO HELP IN OPTICAL CHARACTER RECOGNITION SEVERAL METHODS OF FEATURE EXTRACTION TO HELP IN OPTICAL CHARACTER RECOGNITION Binod Kumar Prasad * * Bengal College of Engineering and Technology, Durgapur, W.B., India. Rajdeep Kundu 2 2 Bengal College

More information

Localization, Extraction and Recognition of Text in Telugu Document Images

Localization, Extraction and Recognition of Text in Telugu Document Images Localization, Extraction and Recognition of Text in Telugu Document Images Atul Negi Department of CIS University of Hyderabad Hyderabad 500046, India atulcs@uohyd.ernet.in K. Nikhil Shanker Department

More information

A Study to Recognize Printed Gujarati Characters Using Tesseract OCR

A Study to Recognize Printed Gujarati Characters Using Tesseract OCR A Study to Recognize Printed Gujarati Characters Using Tesseract OCR Milind Kumar Audichya 1, Jatinderkumar R. Saini 2 1, 2 Computer Science, Gujarat Technological University Abstract: Optical Character

More information

Handwritten Devanagari Character Recognition Model Using Neural Network

Handwritten Devanagari Character Recognition Model Using Neural Network Handwritten Devanagari Character Recognition Model Using Neural Network Gaurav Jaiswal M.Sc. (Computer Science) Department of Computer Science Banaras Hindu University, Varanasi. India gauravjais88@gmail.com

More information

Line and Word Segmentation Approach for Printed Documents

Line and Word Segmentation Approach for Printed Documents Line and Word Segmentation Approach for Printed Documents Nallapareddy Priyanka Computer Vision and Pattern Recognition Unit Indian Statistical Institute, 203 B.T. Road, Kolkata-700108, India Srikanta

More information

One Dim~nsional Representation Of Two Dimensional Information For HMM Based Handwritten Recognition

One Dim~nsional Representation Of Two Dimensional Information For HMM Based Handwritten Recognition One Dim~nsional Representation Of Two Dimensional Information For HMM Based Handwritten Recognition Nafiz Arica Dept. of Computer Engineering, Middle East Technical University, Ankara,Turkey nafiz@ceng.metu.edu.

More information

Recognition of Off-Line Handwritten Devnagari Characters Using Quadratic Classifier

Recognition of Off-Line Handwritten Devnagari Characters Using Quadratic Classifier Recognition of Off-Line Handwritten Devnagari Characters Using Quadratic Classifier N. Sharma, U. Pal*, F. Kimura**, and S. Pal Computer Vision and Pattern Recognition Unit, Indian Statistical Institute

More information

Part-Based Skew Estimation for Mathematical Expressions

Part-Based Skew Estimation for Mathematical Expressions Soma Shiraishi, Yaokai Feng, and Seiichi Uchida shiraishi@human.ait.kyushu-u.ac.jp {fengyk,uchida}@ait.kyushu-u.ac.jp Abstract We propose a novel method for the skew estimation on text images containing

More information

Going digital Challenge & solutions in a newspaper archiving project. Andrey Lomov ATAPY Software Russia

Going digital Challenge & solutions in a newspaper archiving project. Andrey Lomov ATAPY Software Russia Going digital Challenge & solutions in a newspaper archiving project Andrey Lomov ATAPY Software Russia Problem Description Poor recognition results caused by low image quality: noise, white holes in characters,

More information

ABJAD: AN OFF-LINE ARABIC HANDWRITTEN RECOGNITION SYSTEM

ABJAD: AN OFF-LINE ARABIC HANDWRITTEN RECOGNITION SYSTEM ABJAD: AN OFF-LINE ARABIC HANDWRITTEN RECOGNITION SYSTEM RAMZI AHMED HARATY and HICHAM EL-ZABADANI Lebanese American University P.O. Box 13-5053 Chouran Beirut, Lebanon 1102 2801 Phone: 961 1 867621 ext.

More information

A New Algorithm for Detecting Text Line in Handwritten Documents

A New Algorithm for Detecting Text Line in Handwritten Documents A New Algorithm for Detecting Text Line in Handwritten Documents Yi Li 1, Yefeng Zheng 2, David Doermann 1, and Stefan Jaeger 1 1 Laboratory for Language and Media Processing Institute for Advanced Computer

More information

On Segmentation of Documents in Complex Scripts

On Segmentation of Documents in Complex Scripts On Segmentation of Documents in Complex Scripts K. S. Sesh Kumar, Sukesh Kumar and C. V. Jawahar Centre for Visual Information Technology International Institute of Information Technology, Hyderabad, India

More information

Scene Text Detection Using Machine Learning Classifiers

Scene Text Detection Using Machine Learning Classifiers 601 Scene Text Detection Using Machine Learning Classifiers Nafla C.N. 1, Sneha K. 2, Divya K.P. 3 1 (Department of CSE, RCET, Akkikkvu, Thrissur) 2 (Department of CSE, RCET, Akkikkvu, Thrissur) 3 (Department

More information

A Generalized Method to Solve Text-Based CAPTCHAs

A Generalized Method to Solve Text-Based CAPTCHAs A Generalized Method to Solve Text-Based CAPTCHAs Jason Ma, Bilal Badaoui, Emile Chamoun December 11, 2009 1 Abstract We present work in progress on the automated solving of text-based CAPTCHAs. Our method

More information

Invariant Recognition of Hand-Drawn Pictograms Using HMMs with a Rotating Feature Extraction

Invariant Recognition of Hand-Drawn Pictograms Using HMMs with a Rotating Feature Extraction Invariant Recognition of Hand-Drawn Pictograms Using HMMs with a Rotating Feature Extraction Stefan Müller, Gerhard Rigoll, Andreas Kosmala and Denis Mazurenok Department of Computer Science, Faculty of

More information

Image Normalization and Preprocessing for Gujarati Character Recognition

Image Normalization and Preprocessing for Gujarati Character Recognition 334 Image Normalization and Preprocessing for Gujarati Character Recognition Jayashree Rajesh Prasad Department of Computer Engineering, Sinhgad College of Engineering, University of Pune, Pune, Mahaashtra

More information

Keywords Connected Components, Text-Line Extraction, Trained Dataset.

Keywords Connected Components, Text-Line Extraction, Trained Dataset. Volume 4, Issue 11, November 2014 ISSN: 2277 128X International Journal of Advanced Research in Computer Science and Software Engineering Research Paper Available online at: www.ijarcsse.com Language Independent

More information

Slant Correction using Histograms

Slant Correction using Histograms Slant Correction using Histograms Frank de Zeeuw Bachelor s Thesis in Artificial Intelligence Supervised by Axel Brink & Tijn van der Zant July 12, 2006 Abstract Slant is one of the characteristics that

More information

A New Approach to Detect and Extract Characters from Off-Line Printed Images and Text

A New Approach to Detect and Extract Characters from Off-Line Printed Images and Text Available online at www.sciencedirect.com Procedia Computer Science 17 (2013 ) 434 440 Information Technology and Quantitative Management (ITQM2013) A New Approach to Detect and Extract Characters from

More information

Data Hiding in Binary Text Documents 1. Q. Mei, E. K. Wong, and N. Memon

Data Hiding in Binary Text Documents 1. Q. Mei, E. K. Wong, and N. Memon Data Hiding in Binary Text Documents 1 Q. Mei, E. K. Wong, and N. Memon Department of Computer and Information Science Polytechnic University 5 Metrotech Center, Brooklyn, NY 11201 ABSTRACT With the proliferation

More information

Text Enhancement with Asymmetric Filter for Video OCR. Datong Chen, Kim Shearer and Hervé Bourlard

Text Enhancement with Asymmetric Filter for Video OCR. Datong Chen, Kim Shearer and Hervé Bourlard Text Enhancement with Asymmetric Filter for Video OCR Datong Chen, Kim Shearer and Hervé Bourlard Dalle Molle Institute for Perceptual Artificial Intelligence Rue du Simplon 4 1920 Martigny, Switzerland

More information

Vision. OCR and OCV Application Guide OCR and OCV Application Guide 1/14

Vision. OCR and OCV Application Guide OCR and OCV Application Guide 1/14 Vision OCR and OCV Application Guide 1.00 OCR and OCV Application Guide 1/14 General considerations on OCR Encoded information into text and codes can be automatically extracted through a 2D imager device.

More information

Input sensitive thresholding for ancient Hebrew manuscript

Input sensitive thresholding for ancient Hebrew manuscript Pattern Recognition Letters 26 (2005) 1168 1173 www.elsevier.com/locate/patrec Input sensitive thresholding for ancient Hebrew manuscript Itay Bar-Yosef * Department of Computer Science, Ben Gurion University,

More information

Skew Angle Detection of Bangla Script using Radon Transform

Skew Angle Detection of Bangla Script using Radon Transform Sew Angle Detection of Bangla Script using Radon Transform S. M. Murtoza Habib, Nawsher Ahamed Noor and Mumit Khan Center for Research on Bangla Language Processing, BRAC University, Dhaa, Bangladesh.

More information

Time Stamp Detection and Recognition in Video Frames

Time Stamp Detection and Recognition in Video Frames Time Stamp Detection and Recognition in Video Frames Nongluk Covavisaruch and Chetsada Saengpanit Department of Computer Engineering, Chulalongkorn University, Bangkok 10330, Thailand E-mail: nongluk.c@chula.ac.th

More information

Character Segmentation for Telugu Image Document using Multiple Histogram Projections

Character Segmentation for Telugu Image Document using Multiple Histogram Projections Global Journal of Computer Science and Technology Graphics & Vision Volume 13 Issue 5 Version 1.0 Year 2013 Type: Double Blind Peer Reviewed International Research Journal Publisher: Global Journals Inc.

More information

Ligature-based font size independent OCR for Noori Nastalique writing style

Ligature-based font size independent OCR for Noori Nastalique writing style Ligature-based font size independent OCR for Noori Nastalique writing style Qurat ul Ain Akram Sarmad Hussain Center for Language Engineering, Al-Khawarizmi Institute of Computer Science University of

More information

CHAPTER 8 COMPOUND CHARACTER RECOGNITION USING VARIOUS MODELS

CHAPTER 8 COMPOUND CHARACTER RECOGNITION USING VARIOUS MODELS CHAPTER 8 COMPOUND CHARACTER RECOGNITION USING VARIOUS MODELS 8.1 Introduction The recognition systems developed so far were for simple characters comprising of consonants and vowels. But there is one

More information

Segmentation of Images

Segmentation of Images Segmentation of Images SEGMENTATION If an image has been preprocessed appropriately to remove noise and artifacts, segmentation is often the key step in interpreting the image. Image segmentation is a

More information

Recognition-based Segmentation of Nom Characters from Body Text Regions of Stele Images Using Area Voronoi Diagram

Recognition-based Segmentation of Nom Characters from Body Text Regions of Stele Images Using Area Voronoi Diagram Author manuscript, published in "International Conference on Computer Analysis of Images and Patterns - CAIP'2009 5702 (2009) 205-212" DOI : 10.1007/978-3-642-03767-2 Recognition-based Segmentation of

More information

Skew Detection for Complex Document Images Using Fuzzy Runlength

Skew Detection for Complex Document Images Using Fuzzy Runlength Skew Detection for Complex Document Images Using Fuzzy Runlength Zhixin Shi and Venu Govindaraju Center of Excellence for Document Analysis and Recognition(CEDAR) State University of New York at Buffalo,

More information

Scanner Parameter Estimation Using Bilevel Scans of Star Charts

Scanner Parameter Estimation Using Bilevel Scans of Star Charts Boise State University ScholarWorks Electrical and Computer Engineering Faculty Publications and Presentations Department of Electrical and Computer Engineering 1-1-2001 Scanner Parameter Estimation Using

More information

Text Extraction from Natural Scene Images and Conversion to Audio in Smart Phone Applications

Text Extraction from Natural Scene Images and Conversion to Audio in Smart Phone Applications Text Extraction from Natural Scene Images and Conversion to Audio in Smart Phone Applications M. Prabaharan 1, K. Radha 2 M.E Student, Department of Computer Science and Engineering, Muthayammal Engineering

More information

An Accurate Method for Skew Determination in Document Images

An Accurate Method for Skew Determination in Document Images DICTA00: Digital Image Computing Techniques and Applications, 1 January 00, Melbourne, Australia. An Accurate Method for Skew Determination in Document Images S. Lowther, V. Chandran and S. Sridharan Research

More information

A Statistical approach to line segmentation in handwritten documents

A Statistical approach to line segmentation in handwritten documents A Statistical approach to line segmentation in handwritten documents Manivannan Arivazhagan, Harish Srinivasan and Sargur Srihari Center of Excellence for Document Analysis and Recognition (CEDAR) University

More information

Word Slant Estimation using Non-Horizontal Character Parts and Core-Region Information

Word Slant Estimation using Non-Horizontal Character Parts and Core-Region Information 2012 10th IAPR International Workshop on Document Analysis Systems Word Slant using Non-Horizontal Character Parts and Core-Region Information A. Papandreou and B. Gatos Computational Intelligence Laboratory,

More information

A new approach to reference point location in fingerprint recognition

A new approach to reference point location in fingerprint recognition A new approach to reference point location in fingerprint recognition Piotr Porwik a) and Lukasz Wieclaw b) Institute of Informatics, Silesian University 41 200 Sosnowiec ul. Bedzinska 39, Poland a) porwik@us.edu.pl

More information

Rough Set Approach to Unsupervised Neural Network based Pattern Classifier

Rough Set Approach to Unsupervised Neural Network based Pattern Classifier Rough Set Approach to Unsupervised Neural based Pattern Classifier Ashwin Kothari, Member IAENG, Avinash Keskar, Shreesha Srinath, and Rakesh Chalsani Abstract Early Convergence, input feature space with

More information

Handwritten Devanagari Character Recognition

Handwritten Devanagari Character Recognition Handwritten Devanagari Character Recognition Akhil Deshmukh, Rahul Meshram, Sachin Kendre, Kunal Shah Department of Computer Engineering Sinhgad Institute of Technology (SIT) Lonavala University of Pune,

More information

HCR Using K-Means Clustering Algorithm

HCR Using K-Means Clustering Algorithm HCR Using K-Means Clustering Algorithm Meha Mathur 1, Anil Saroliya 2 Amity School of Engineering & Technology Amity University Rajasthan, India Abstract: Hindi is a national language of India, there are

More information

DATABASE DEVELOPMENT OF HISTORICAL DOCUMENTS: SKEW DETECTION AND CORRECTION

DATABASE DEVELOPMENT OF HISTORICAL DOCUMENTS: SKEW DETECTION AND CORRECTION DATABASE DEVELOPMENT OF HISTORICAL DOCUMENTS: SKEW DETECTION AND CORRECTION S P Sachin 1, Banumathi K L 2, Vanitha R 3 1 UG, Student of Department of ECE, BIET, Davangere, (India) 2,3 Assistant Professor,

More information

Recognition of Gurmukhi Text from Sign Board Images Captured from Mobile Camera

Recognition of Gurmukhi Text from Sign Board Images Captured from Mobile Camera International Journal of Information & Computation Technology. ISSN 0974-2239 Volume 4, Number 17 (2014), pp. 1839-1845 International Research Publications House http://www. irphouse.com Recognition of

More information

TEXT DETECTION AND RECOGNITION FROM IMAGES OF NATURAL SCENE

TEXT DETECTION AND RECOGNITION FROM IMAGES OF NATURAL SCENE TEXT DETECTION AND RECOGNITION FROM IMAGES OF NATURAL SCENE Arti A. Gawade and R. V. Dagade Department of Computer Engineering MMCOE Savitraibai Phule Pune University, India ABSTRACT: Devanagari is the

More information

N.Priya. Keywords Compass mask, Threshold, Morphological Operators, Statistical Measures, Text extraction

N.Priya. Keywords Compass mask, Threshold, Morphological Operators, Statistical Measures, Text extraction Volume, Issue 8, August ISSN: 77 8X International Journal of Advanced Research in Computer Science and Software Engineering Research Paper Available online at: www.ijarcsse.com A Combined Edge-Based Text

More information

Devanagari Handwriting Recognition and Editing Using Neural Network

Devanagari Handwriting Recognition and Editing Using Neural Network Devanagari Handwriting Recognition and Editing Using Neural Network Sohan Lal Sahu RSR Rungta College of Engineering & Technology (RSR-RCET), Bhilai 490024 Abstract- Character recognition plays an important

More information

LECTURE 6 TEXT PROCESSING

LECTURE 6 TEXT PROCESSING SCIENTIFIC DATA COMPUTING 1 MTAT.08.042 LECTURE 6 TEXT PROCESSING Prepared by: Amnir Hadachi Institute of Computer Science, University of Tartu amnir.hadachi@ut.ee OUTLINE Aims Character Typology OCR systems

More information

Sinhala Handwriting Recognition Mechanism Using Zone Based Feature Extraction

Sinhala Handwriting Recognition Mechanism Using Zone Based Feature Extraction Sinhala Handwriting Recognition Mechanism Using Zone Based Feature Extraction 10 K.A.K.N.D. Dharmapala 1, W.P.M.V. Wijesooriya 2, C.P. Chandrasekara 3, U.K.A.U. Rathnapriya 4, L. Ranathunga 5 Department

More information

Pixels. Orientation π. θ π/2 φ. x (i) A (i, j) height. (x, y) y(j)

Pixels. Orientation π. θ π/2 φ. x (i) A (i, j) height. (x, y) y(j) 4th International Conf. on Document Analysis and Recognition, pp.142-146, Ulm, Germany, August 18-20, 1997 Skew and Slant Correction for Document Images Using Gradient Direction Changming Sun Λ CSIRO Math.

More information

Structural and Syntactic Techniques for Recognition of Ethiopic Characters

Structural and Syntactic Techniques for Recognition of Ethiopic Characters Structural and Syntactic Techniques for Recognition of Ethiopic Characters Yaregal Assabie and Josef Bigun School of Information Science, Computer and Electrical Engineering Halmstad University, SE-301

More information

Cursive Handwriting Recognition System Using Feature Extraction and Artificial Neural Network

Cursive Handwriting Recognition System Using Feature Extraction and Artificial Neural Network Cursive Handwriting Recognition System Using Feature Extraction and Artificial Neural Network Utkarsh Dwivedi 1, Pranjal Rajput 2, Manish Kumar Sharma 3 1UG Scholar, Dept. of CSE, GCET, Greater Noida,

More information

Handwritten Marathi Character Recognition on an Android Device

Handwritten Marathi Character Recognition on an Android Device Handwritten Marathi Character Recognition on an Android Device Tanvi Zunjarrao 1, Uday Joshi 2 1MTech Student, Computer Engineering, KJ Somaiya College of Engineering,Vidyavihar,India 2Associate Professor,

More information

Universal Graphical User Interface for Online Handwritten Character Recognition

Universal Graphical User Interface for Online Handwritten Character Recognition Universal Graphical User Interface for Online Handwritten Character Recognition Thesis submitted in partial fulfillment of the requirements for the award of degree of Master of Engineering in Computer

More information

Hilditch s Algorithm Based Tamil Character Recognition

Hilditch s Algorithm Based Tamil Character Recognition Hilditch s Algorithm Based Tamil Character Recognition V. Karthikeyan Department of ECE, SVS College of Engineering Coimbatore, India, Karthick77keyan@gmail.com Abstract-Character identification plays

More information