Improved Optical Recognition of Bangla Characters
|
|
- Tyrone Barber
- 5 years ago
- Views:
Transcription
1 SUST Studies, Vol. 12, No. 1, 2010; P:69-78 Improved Optical Recognition of Bangla Characters Md. Mahbubul Haque, A.Q. M. Saiful Islam, Md. Mahadi Hasan and M. Shahidur Rahman Department of Computer Science & Engineering, Shahjalal University of Science & Technology Sylhet-3114, Bangladesh. Abstract This paper presents a method on improved optical recognition of Bangla characters toward building a working system. The recognition process performs pre-processing of input image, segmentation of both lines, words and characters and then recognizes the characters based on the classifying criteria. Experience with OCR (Optical Character Recognition) problem teaches that for most subtasks involve in OCR, there is no single technique that gives perfect results for all the variations of printed text. Based on the features of Bangla characters, we have proposed a number of classifying features for generating an 11-digit character code which is then used to identify a particular character. Additionally, the character code is suitable of recognizing the conjuncts without breaking into their constituents. Experimental results show that the proposed algorithm can provide recognition accuracy up to 99% for regular-shaped distinct basic characters and conjuncts. This shows us a hope toward building a complete working system in Bangla. 1. Introduction An OCR can be regarded as one of the most useful tool in the process of digitization of literature. It is, however, a challenging task to build a practical OCR [1, 2]. There are several problems encountered in processing a document. The text zone of the document needs to be extracted before the recognition of the text can take place. In printed text, skew results in overlap between the text lines that makes it difficult to split the lines. The background noise can further complicate the situation. The ink spread results in character fusions and fading can fragment a character. In OCR, there is a conflicting demand of classifying a large set of natural variants into a single class and at the same time discriminate between closely resembling patterns. The last 50 years of research has clearly demonstrated that no single strategy is sufficient for dealing with the complexity of the problem. Moreover, the strategy cannot be same for reading texts of different scripts/languages. Bangla script is alphabetic in nature and the words are two dimensional compositions of characters and symbols which makes it different from Roman and ideographic scripts. The algorithms which perform well for other scripts can be applied only after extensive pre-processing which makes simple adaptation ineffective. An optical character recognition system usually consists of two main stages, namely segmentation and recognition. There is a possibility of ambiguity at each stage. The line segmentation may be ambiguous due to tilt and overlapping of text lines. Some of the closely written words may not get segmented into individual words. The characters may be fused due to unwanted breaks, resulting in wrong segmentation. Very often, adjacent characters are touching, and may exist in an overlapped field. Therefore, it is a complex task to segment a given word correctly into its character components. B. B. Chaudhuri and U. Pal have done a great amount of tasks in the area of Bangla OCR as well as segmentation [3-6]. Alamgir have reported various aspects of Bangla script recognition in [7]. The post-
2 70 Md. Mahbubul Haque, A.Q. M. Saiful Islam, Md. Mahadi Hasan and M. Shahidur Rahman processing system is based on contextual knowledge which checks the composition syntax. Khademul [8] have described Bangla numeral recognition based on the structural approach. Neural network approach for isolated characters has also been reported [9-11]. One of the recent works in character segmentation is found in [12] where the solution concerns about the reformation of the characters of upper and middle zone. Hasnat et. al. at [13] discussed the problems and provide a training dependent solution to reduce the segmentation error. Hasnat and Khan dealt with eliminating splitting errors in [14] where they applied rule-based feature information in addition with some classifying features. Since the basic characters (vowels and consonants) constitute 94%-96% of the text, most the researches emphasize on recognizing the basic characters. In addition with the basic characters, in this paper, we have proposed new classifying features for identifying conjuncts. Experimental results show satisfactory accuracy for distinct basic characters and conjuncts. Currently, research is going on to attain acceptable result up to sentence and paragraph-level so that a true working system can be built on Bangla OCR. 2. Composition of Bangla script from OCR viewpoint Bangla script can be summarized as comprising 49 basic characters, 14 modifiers called kar, fola, ref and hashanta (placed to the left, right, above, or at the bottom of a character or conjunct), more than conjuncts, 10 numerals and 25 symbols and punctuation marks [15]. Some of the modifiers are placed before or next to the consonant (core modifiers), some are above (top modifiers) and some are placed below (lower modifiers) a consonant. In Bangla, a consonant has a pure form as well as a shortened form. Bangla script contains a pure form for all the consonants and shortened form of many of the consonants. A consonant in shortened form always touches the next character, yielding conjuncts. Other than the numeric symbols, there are some punctuation marks in Bangla script which are used to provide actual meaning to the text. A horizontal line is drawn on top of all characters of a word that is referred to as header line. It is convenient to visualize a Bangla word in terms of three strips: a core strip, a top strip and a bottom strip. The core and top strips are separated by the header line. The top strip contains the top modifiers, the bottom strip contains lower modifiers and the core strip contains the characters and core modifiers. If no consonant of a word has top modifier, the top strip will be empty. Similarly, if no character of a word has lower modifier, the bottom strip will be empty. It is possible that either of the bottom or top strips, or both the strips are present. 3. Recognition Methodology A script can contain images, graphs, tables etc, in addition to the text. Extraction of text-zones from the document has been extensively studied [10, 11, 12] and still continues to be an active research area. In this paper, we employ a pre-processing stage that extracts uniform text zones from the document image. The current system segments each uniform text zone into text lines and text lines into words. Words are further segmented into characters and symbols. The characters and symbols may not be valid Bangla symbols when viewed in isolation. We refer to characters and symbols as recognition unit. Extraction of unit from preprocessed document Image can be performed by the steps outlined in Fig. 1. A. Line Segmentation Bangla script is written from left to right and top to bottom. A text line is separated from the previous and following text lines by white space. This segmentation is based on horizontal histograms of the document. A horizontal histogram of the uniform text zone is produced. A zero value in the histogram corresponds to a horizontal gap. The horizontal gaps are assumed to be the line boundaries.
3 Improved Optical Recognition of Bangla Characters 71 B. Identify Header-Line Position Position of the header-line is a dominating feature for extracting characters and modifiers correctly. Headerline is a long vertical stroke started from left and spans to the right from start to end. As this is the common line, the horizontal projection shows it as an instantaneous stroke at the plot. The stroke becomes very high rapidly and falls down rapidly again having a width of only 1 or 2 pixels. The system checks if there is a rapid variation in pixel count while checking the rows one after another. If it is found the position is stored. Scan the document Image Preprocessing: Text area recognition Segmentation: => Line segmentation Star of line End of line => Word segmentation Detection of Top strip Detection of Core strip Bottom script of word detection => Header line detection => Character Segmentation Preliminary Character Segmentation Defined Character Bound Shadow Character Segmentation Upper and Lower Modifier Segmentation => Character Recognition Categorize Character Does Header Exist? Bar Position? 1. Pre bar? 2. Mid bar? 3. End bar? 4. No bar? Is Upper Modifier? Is Connected with Header Line? Generate Unique Character Code Character Code = Criteria code + Character Sequence (Horizontal & Vertical) Identify the Character according to the Character Code Construction of Words and Sentences Fig. 1: The outline of the proposed method C. Segmentation of a text line into Words In Bangla script, characters and symbols of a word are joined together by a header line. As a result, word boundaries are rarely ambiguous. The gap in header line creates no problem for word boundary identification. A vertical histogram of a text line is made and every gap of two or more pixels in the histogram is taken to be the word delimiter.
4 72 Md. Mahbubul Haque, A.Q. M. Saiful Islam, Md. Mahadi Hasan and M. Shahidur Rahman D. Segmentation of a word into symbols and characters The header line joins the characters of a word together which makes the segmentation of a word into its constituent characters slightly difficult. The region above the header line contains upper modifier symbols. The region below the header line contains core characters and lower modifier symbols. Before segmentation can progress any further, header line must be identified. Header line is easily identified as it is the most dominating horizontal line in a word. After the header line is removed, vertical gap separates the top modifiers from the neighbors. The characters below header line are also separated from their neighbors by vertical gaps. Step 1 Step 2 Step 3 Vertical histogram Step 4 Step 5 Step 6 Fig. 2: Character segmentation from text line E. Segmentation of Modifiers Modifiers suitable for using at left and right position of a core character can be easily segmented from the vertical projection profile (e.g. evsjv `k in Fig. 2). However, decomposing upper and lower modifier is relatively tedious. Horizontal and vertical projections of upper modifiers are taken starting from (0,0) to (char width, header- line position). If an upper modifier joins another left or right modifier in the header line, then concatenation of both is considered a single modifier (as in the cases of ÕwÕ, x ). Fig. 3 shows an upper modifier, its horizontal and vertical projections and concatenation with another (left or right) modifier. Upper modifier Horizontal Projection Vertical Projection Concatenated Fig. 3: Segmentation process of the upper modifier
5 Improved Optical Recognition of Bangla Characters 73 Lower modifiers may have a thick joining, weak joining or no joining at all. If the lower modifier has no joining with the core character, it can be readily separated by an outer boundary traversal (Fig. 4a). But if they have a thick or weak joining, a threshold of the height of the character is determined first and then the point of lowest density investigated below the threshold height. This position is treated as the segmentation point where the core character is separated from the modifier. Fig. 4b shows the case. Gap Segmented Thick Segmentation Point Segmented (a) No joining (b) Thick joining Fig. 4: Segmentation process of the lower modifier 4. Recognition of Characters A. After the word is segmented, the recognition algorithm is applied to characters. The salient features employed to the recognition process are summarized below. i. Existence of the header line In Bangla script, some characters have complete header line whereas some doesn t have. This can be the first criterion to categorize the characters which make the recognition process easier and faster. Fig. 5: Recognition of the header line ii. Position of the Vertical Bar Bangla characters often exhibit vertical bar starting from header line and spans up to the whole core strip height. Those characters can be classified as Pre Bar characters Mid Bar characters End Bar characters, and Non Bar characters Fig. 6: Recognition of the placement of the vertical bars
6 74 Md. Mahbubul Haque, A.Q. M. Saiful Islam, Md. Mahadi Hasan and M. Shahidur Rahman iii. Top strip Some characters have portion at the top strip, which is absent in some other characters. Fig. 7: Recognition of the existence of the top strip iv. Connection to the Header Line Sometimes the header line is connected with the rest part of the characters whereas some other characters do not touch the header line such as in Z Ó and ÓfÓ. Fig. 8: Verify if a character touches the header line v. Dot as a lower modifier Recognition of the presence of the dots at the lower part of the character further helps classify the characters. Fig. 9: Recognition of the existence of the dot at the lower part of the character In addition with the features stated above, some other classifying features can be used, for example, extra connections with he header line (m, b), connection place with end-bar (b, m) and number of connections with end-bar (b, l), for improved accuracy. B. Horizontal Zero Crossings Image of a character is treated as an array of pixels. A black pixel is expressed by a 1 and a white pixel by 0. After tracing the whole array row by row, number of transitions from black pixel to white pixel is recorded for each row. Let N i be the number of transitions for ith row. The sequence N i, 0<= i<= n, where n is the pixel height of the character, is referred to as horizontal zero crossing sequence S.
7 Improved Optical Recognition of Bangla Characters 75 Fig. 10: The process of horizontal zero-crossings A character is divided into n horizontal segments of equal height. Each segment is represented by the number of zero-transitions that is most frequent in the segment. We divide the sequence S into three sub-sequences of equal length, S1,S2, and S3. For each subsequence the most frequent number is stored in S1-Most, S2-Most, S3-Most. The feature vector consists of (S1- Most, S2 Most, S3- Most) which is then used to filter out the false character candidates. The following table shows the obtained sequence, which are same for all the 3 fontsize of ÕKÕ and L' which are 131 and 122, respectively. Similar results have been obtained using four different font faces. Table 1: Sequence S for four samples of character ÕKÕ and L' Sequence S for ÕKÕ Sequence s for Õ L Õ => => => => => => => => => => => => 122 C. Vertical Zero Crossings Vertical zero crossing is as same as the horizontal crossing where the image block is divided into several vertical segments. The process of vertical zero crossing yields another 3-digit code. Fig. 11: The process of vertical zero-crossings D. Recognition from the Generated Code sequence The processes in Sections A, B, and C produce 11-digit (5 digits of category code, 3 digits of horizontal, and 3 digits from vertical zero crossings) code that is employed for identifying a character.
8 76 Md. Mahbubul Haque, A.Q. M. Saiful Islam, Md. Mahadi Hasan and M. Shahidur Rahman Criteria Code (5 Digit) Header 1 = Yes 0 = No Bar 1=Pre bar 2=Mid bar 3=End bar 4=No bar Table 2: Generating character code using the classifying features Upper Modifier 1= Yes 0= No Connection with header 1 = Yes 0 = No Existence of dot 1 = Yes 0 = No Horizontal code (3 digit) Vertical code (3 digit) Unicode K' 004b L 004c The obtained character code can vary according to font face and size. In the next section we presented experimental results for font-face SutonnyMJ. 5. Recognition Results Four snapshots from the program output are shown in Fig. 5. As seen in the figure, four type of input (same size basic characters, different size basic characters, same size conjuncts, different size conjuncts) have been used to verify the performance of the proposed method. The font size varies from 14 to 32-point. Experimental setup and the obtained recognition accuracy are summarized in Table 3. Table 3: Recognition accuracy of the proposed method Font size No of Characters (basic) Recognition accuracy (%) No of Characters (conjuncts) Conclusion Recognition accuracy (%) In this paper, we have presented a recognition method of Bangla characters. The results presented here using frequently used basic characters and conjuncts are satisfactory. The accuracy, however, decreases when recognizing from a complete sentence. Research is going on to improve the accuracy rate on characters segmented from sentences. Primarily, we concern only the regular-shaped font of varying sizes. By incorporating a neural network based post processor, we aim to build a complete working system on Bangla OCR addressing more variations in printed Bangla text.
9 Improved Optical Recognition of Bangla Characters 77 References [1] S. Kahan, T. Pavlidis, and H.S. Baird, On the recognition of printed characters of any font and size, IEEE Trans. Pattern Analysis and Machine Intelligence, Vol. 9(2), pp , [2] M-K. Kim, Y-B. Kwon, "Multi-font and Multi-Size Character Recognition Based on the Sampling and Quantization of an Unwrapped Contour," Proc. of Intl. Conf. on Pattern Recognition, vol. 3, pp.170, Vienna, Aug 25-29, [3] U. Pal, B. B. Chaudhuri, "OCR in Bangla: an Indo-Bangladeshi Language", Proc. of ICPR, pp , Jerusalem, Israel, [4] B. B. Chaudhuri, U. Pal, "An OCR System to Read Two Indian Language Scripts: Bangla and Devnagari(Hindi)", Proc. of 4th ICDAR, vol.2, pp , Ulm, Germany, [5] U. Garain and B. B. Chaudhuri, Segmentation of Touching Characters in Printed Devnagari and Bangla Scripts Using Fuzzy Multifactorial Analysis, IEEE Transactions on Systems, Man and Cybernetics, vol. 32, pp , Nov., [6] B. B. Chaudhuri, "Digital document processing: major directions and recent advances", Springer, London, [7] Mohammed Alamgir, Md. Khademul Islam Molla, and Muhammed Zafar Iqbal, Bangla Character Recognition System, Proc. of ICCIT 99, pp , 3-5 December [8] Md. Khademul Islam Molla and Kamrul Hasan Talukder, Bangla Number Extraction and Recognition from Document Image, Proc. of ICCIT, pp , Dhaka, [9] A. A. Chowdhury, Ejaj Ahmed, S. Ahmed, S. Hossain and C. M. Rahman, "Optical Character Recognition of Bangla Characters using neural network: A better approach". Proc. of 2nd ICEE, [10] J. U. Mahmud, M. F. Raihan and C. M. Rahman, "A Complete OCR System for Continuous Bangla Characters", Proc. of the Conf. on Convergent Technologies, [11] S. M. Shoeb Shatil and Mumit Khan, Minimally Segmenting High Performance Bangla OCR using Kohonen Network, Proc. of 9th ICCIT, [12] Md. Abdus Sattar, Khaled Mahmud, Humayun Arafat and A F M Noor Uz Zaman, "Segmenting Bangla Text for Optical Recognition, Proc. of ICCIT, [13] Md. Abul Hasnat, S M Murtoza Habib and Mumit Khan, "A High Performance Domain Specific OCR for Bangla Script", Proc. of Int. Joint Conf. on Computer, Information, and Systems Sciences, and Engineering (CISSE), [14] Md. Abul Hasnat and Mumit Khan, Elimination of Splitting Errors in Printed Bangla Scripts Proc. of the Conference on Language & Technology 2009, January, 2009 [15] A. Elahi, Bangla OCR: A Complete Working System, B.Sc. Thesis, 2006, Dept. Computer Science & Engineering, Shahjajlal University of Science & Technology, Sylhet, Bangladesh.
10 78 Md. Mahbubul Haque, A.Q. M. Saiful Islam, Md. Mahadi Hasan and M. Shahidur Rahman Fig. 5: Snapshots form the program output Submitted: 8 th July 2009; Accepted for Publication: 25 th January, 2010.
Optical Character Recognition (OCR) for Printed Devnagari Script Using Artificial Neural Network
International Journal of Computer Science & Communication Vol. 1, No. 1, January-June 2010, pp. 91-95 Optical Character Recognition (OCR) for Printed Devnagari Script Using Artificial Neural Network Raghuraj
More informationMinimally Segmenting High Performance Bangla Optical Character Recognition Using Kohonen Network
Minimally Segmenting High Performance Bangla Optical Character Recognition Using Kohonen Network Adnan Mohammad Shoeb Shatil and Mumit Khan Computer Science and Engineering, BRAC University, Dhaka, Bangladesh
More informationIsolated Handwritten Words Segmentation Techniques in Gurmukhi Script
Isolated Handwritten Words Segmentation Techniques in Gurmukhi Script Galaxy Bansal Dharamveer Sharma ABSTRACT Segmentation of handwritten words is a challenging task primarily because of structural features
More informationA New Technique for Segmentation of Handwritten Numerical Strings of Bangla Language
I.J. Information Technology and Computer Science, 2013, 05, 38-43 Published Online April 2013 in MECS (http://www.mecs-press.org/) DOI: 10.5815/ijitcs.2013.05.05 A New Technique for Segmentation of Handwritten
More informationSegmentation free Bangla OCR using HMM: Training and Recognition
Segmentation free Bangla OCR using HMM: Training and Recognition Md. Abul Hasnat, S.M. Murtoza Habib, Mumit Khan BRAC University, Bangladesh mhasnat@gmail.com, murtoza@gmail.com, mumit@bracuniversity.ac.bd
More informationOptical Character Recognition For Bangla Documents Using HMM
Optical Character Recognition For Bangla Documents Using HMM Md. Sheemam Monjel and Mumit Khan Dept. of CSE, BRAC University, Dhaka, Bangladesh. sheemam@bracuniversity.net, mumit@bracuniversity.net Abstract
More informationSkew Angle Detection of Bangla Script using Radon Transform
Skew Angle Detection of Bangla Script using Radon Transform S. M. Murtoza Habib, Nawsher Ahamed Noor and Mumit Khan Center for Research on Bangla Language Processing, BRAC University, Dhaka, Bangladesh.
More informationA Survey of Problems of Overlapped Handwritten Characters in Recognition process for Gurmukhi Script
A Survey of Problems of Overlapped Handwritten Characters in Recognition process for Gurmukhi Script Arwinder Kaur 1, Ashok Kumar Bathla 2 1 M. Tech. Student, CE Dept., 2 Assistant Professor, CE Dept.,
More informationFRAGMENTATION OF HANDWRITTEN TOUCHING CHARACTERS IN DEVANAGARI SCRIPT
International Journal of Information Technology, Modeling and Computing (IJITMC) Vol. 2, No. 1, February 2014 FRAGMENTATION OF HANDWRITTEN TOUCHING CHARACTERS IN DEVANAGARI SCRIPT Shuchi Kapoor 1 and Vivek
More informationSegmentation of Bangla Handwritten Text
Thesis Report Segmentation of Bangla Handwritten Text Submitted By: Sabbir Sadik ID:09301027 Md. Numan Sarwar ID: 09201027 CSE Department BRAC University Supervisor: Professor Dr. Mumit Khan Date: 13 th
More informationAn Efficient Character Segmentation Based on VNP Algorithm
Research Journal of Applied Sciences, Engineering and Technology 4(24): 5438-5442, 2012 ISSN: 2040-7467 Maxwell Scientific organization, 2012 Submitted: March 18, 2012 Accepted: April 14, 2012 Published:
More informationOptical Character Recognition of Bangla Characters using neural network: A better approach
Optical Character Recognition of Bangla Characters using neural network: A better approach Ahmed Asif Chowdhury, Ejaj Ahmed, Shameem Ahmed, Shohrab Hossain, Chowdhury Mofizur Rahman Department of Computer
More informationSegmentation Based Optical Character Recognition for Handwritten Marathi characters
Segmentation Based Optical Character Recognition for Handwritten Marathi characters Madhav Vaidya 1, Yashwant Joshi 2,Milind Bhalerao 3 Department of Information Technology 1 Department of Electronics
More informationA System for Joining and Recognition of Broken Bangla Numerals for Indian Postal Automation
A System for Joining and Recognition of Broken Bangla Numerals for Indian Postal Automation K. Roy, U. Pal and B. B. Chaudhuri CVPR Unit; Indian Statistical Institute, Kolkata-108; India umapada@isical.ac.in
More informationOCR For Handwritten Marathi Script
International Journal of Scientific & Engineering Research Volume 3, Issue 8, August-2012 1 OCR For Handwritten Marathi Script Mrs.Vinaya. S. Tapkir 1, Mrs.Sushma.D.Shelke 2 1 Maharashtra Academy Of Engineering,
More informationSegmentation free Bangla OCR using HMM: Training and Recognition
Segmentation free Bangla OCR using HMM: Training and Recognition Md. Abul Hasnat BRAC University, Bangladesh mhasnat@gmail.com S. M. Murtoza Habib BRAC University, Bangladesh murtoza@gmail.com Mumit Khan
More informationRecognition of Unconstrained Malayalam Handwritten Numeral
Recognition of Unconstrained Malayalam Handwritten Numeral U. Pal, S. Kundu, Y. Ali, H. Islam and N. Tripathy C VPR Unit, Indian Statistical Institute, Kolkata-108, India Email: umapada@isical.ac.in Abstract
More informationResearch Report on Bangla OCR Training and Testing Methods
Research Report on Bangla OCR Training and Testing Methods Md. Abul Hasnat BRAC University, Dhaka, Bangladesh. hasnat@bracu.ac.bd Abstract In this paper we present the training and recognition mechanism
More informationIndian Multi-Script Full Pin-code String Recognition for Postal Automation
2009 10th International Conference on Document Analysis and Recognition Indian Multi-Script Full Pin-code String Recognition for Postal Automation U. Pal 1, R. K. Roy 1, K. Roy 2 and F. Kimura 3 1 Computer
More informationBANGLA OPTICAL CHARACTER RECOGNITION. A Thesis. Submitted to the Department of Computer Science and Engineering. BRAC University. S. M.
BANGLA OPTICAL CHARACTER RECOGNITION A Thesis Submitted to the Department of Computer Science and Engineering of BRAC University by S. M. Murtoza Habib Student ID: 01201071 In Partial Fulfillment of the
More informationBuilding Multi Script OCR for Brahmi Scripts: Selection of Efficient Features
Building Multi Script OCR for Brahmi Scripts: Selection of Efficient Features Md. Abul Hasnat Center for Research on Bangla Language Processing (CRBLP) Center for Research on Bangla Language Processing
More informationFine Classification of Unconstrained Handwritten Persian/Arabic Numerals by Removing Confusion amongst Similar Classes
2009 10th International Conference on Document Analysis and Recognition Fine Classification of Unconstrained Handwritten Persian/Arabic Numerals by Removing Confusion amongst Similar Classes Alireza Alaei
More informationINTERNATIONAL JOURNAL OF PURE AND APPLIED RESEARCH IN ENGINEERING AND TECHNOLOGY
INTERNATIONAL JOURNAL OF PURE AND APPLIED RESEARCH IN ENGINEERING AND TECHNOLOGY A PATH FOR HORIZING YOUR INNOVATIVE WORK HANDWRITTEN DEVANAGARI CHARACTERS RECOGNITION THROUGH SEGMENTATION AND ARTIFICIAL
More informationA two-stage approach for segmentation of handwritten Bangla word images
A two-stage approach for segmentation of handwritten Bangla word images Ram Sarkar, Nibaran Das, Subhadip Basu, Mahantapas Kundu, Mita Nasipuri #, Dipak Kumar Basu Computer Science & Engineering Department,
More informationA FEATURE BASED CHAIN CODE METHOD FOR IDENTIFYING PRINTED BENGALI CHARACTERS
A FEATURE BASED CHAIN CODE METHOD FOR IDENTIFYING PRINTED BENGALI CHARACTERS Ankita Sikdar 1, Payal Roy 1, Somdeep Mukherjee 1, Moumita Das 1 and Sreeparna Banerjee 2 1 Department of Computer Science and
More informationBangla Character Recognition System is Developed by using Automatic Feature Extraction and XOR Operation
Global Journal of Computer Science and Technology Graphics & Vision Volume 13 Issue 2 Version 1.0 Year 2013 Type: Double Blind Peer Reviewed International Research Journal Publisher: Global Journals Inc.
More informationA Segmentation Free Approach to Arabic and Urdu OCR
A Segmentation Free Approach to Arabic and Urdu OCR Nazly Sabbour 1 and Faisal Shafait 2 1 Department of Computer Science, German University in Cairo (GUC), Cairo, Egypt; 2 German Research Center for Artificial
More informationMorphological Approach for Segmentation of Scanned Handwritten Devnagari Text
Abstract In this paper we present a system towards the of Hindi Handwritten Devnagari Text. Segmentation of script is essential for handwritten script recognition. This system deals with of (matras) and
More informationWord-wise Hand-written Script Separation for Indian Postal automation
Word-wise Hand-written Script Separation for Indian Postal automation K. Roy U. Pal Dept. of Comp. Sc. & Engg. West Bengal University of Technology, Sector 1, Saltlake City, Kolkata-64, India Abstract
More informationEnhancing the Character Segmentation Accuracy of Bangla OCR using BPNN
Enhancing the Character Segmentation Accuracy of Bangla OCR using BPNN Shamim Ahmed 1, Mohammod Abul Kashem 2 1 M.S. Student, Department of Computer Science and Engineering, Dhaka University of Engineering
More informationLayout Segmentation of Scanned Newspaper Documents
, pp-05-10 Layout Segmentation of Scanned Newspaper Documents A.Bandyopadhyay, A. Ganguly and U.Pal CVPR Unit, Indian Statistical Institute 203 B T Road, Kolkata, India. Abstract: Layout segmentation algorithms
More informationComplementary Features Combined in a MLP-based System to Recognize Handwritten Devnagari Character
Journal of Information Hiding and Multimedia Signal Processing 2011 ISSN 2073-4212 Ubiquitous International Volume 2, Number 1, January 2011 Complementary Features Combined in a MLP-based System to Recognize
More informationCharacter Recognition
Character Recognition 5.1 INTRODUCTION Recognition is one of the important steps in image processing. There are different methods such as Histogram method, Hough transformation, Neural computing approaches
More informationResearch Report on Bangla Optical Character Recognition Using Kohonen etwork
Research Report on Bangla Optical Character Recognition Using Kohonen etwork Adnan Md. Shoeb Shatil Center for Research on Bangla Language Processing BRAC University, Dhaka, Bangladesh. shoeb_shatil@yahoo.com
More informationSEGMENTATION OF CHARACTERS WITHOUT MODIFIERS FROM A PRINTED BANGLA TEXT
SEGMENTATION OF CHARACTERS WITHOUT MODIFIERS FROM A PRINTED BANGLA TEXT ABSTRACT Rupak Bhattacharyya et al. (Eds) : ACER 2013, pp. 11 24, 2013. CS & IT-CSCP 2013 Fakruddin Ali Ahmed Department of Computer
More informationA Review on Different Character Segmentation Techniques for Handwritten Gurmukhi Scripts
WWJMRD2017; 3(10): 162-166 www.wwjmrd.com International Journal Peer Reviewed Journal Refereed Journal Indexed Journal UGC Approved Journal Impact Factor MJIF: 4.25 e-issn: 2454-6615 Manas Kaur Research
More informationA Complete Workflow for Development of Bangla OCR
A Complete Workflow for Development of Bangla OCR Farjana Yeasmin Omee Dept. of Computer Science & Engineering Shahjalal University of Science & Technology, Sylhet Shiam Shabbir Himel Dept. of Computer
More informationOnline Bangla Handwriting Recognition System
1 Online Bangla Handwriting Recognition System K. Roy Dept. of Comp. Sc. West Bengal University of Technology, BF 142, Saltlake, Kolkata-64, India N. Sharma, T. Pal and U. Pal Computer Vision and Pattern
More informationKhmer OCR for Limon R1 Size 22 Report
PAN Localization Project Project No: Ref. No: PANL10n/KH/Report/phase2/002 Khmer OCR for Limon R1 Size 22 Report 09 July, 2009 Prepared by: Mr. ING LENG IENG Cambodia Country Component PAN Localization
More informationRecognition of online captured, handwritten Tamil words on Android
Recognition of online captured, handwritten Tamil words on Android A G Ramakrishnan and Bhargava Urala K Medical Intelligence and Language Engineering (MILE) Laboratory, Dept. of Electrical Engineering,
More informationAn Accurate and Efficient System for Segmenting Machine-Printed Text. Yi Lu, Beverly Haist, Laurel Harmon, John Trenkle and Robert Vogt
An Accurate and Efficient System for Segmenting Machine-Printed Text Yi Lu, Beverly Haist, Laurel Harmon, John Trenkle and Robert Vogt Environmental Research Institute of Michigan P. O. Box 134001 Ann
More informationScanner Parameter Estimation Using Bilevel Scans of Star Charts
ICDAR, Seattle WA September Scanner Parameter Estimation Using Bilevel Scans of Star Charts Elisa H. Barney Smith Electrical and Computer Engineering Department Boise State University, Boise, Idaho 8375
More informationComparative Performance Analysis of Feature(S)- Classifier Combination for Devanagari Optical Character Recognition System
Comparative Performance Analysis of Feature(S)- Classifier Combination for Devanagari Optical Character Recognition System Jasbir Singh Department of Computer Science Punjabi University Patiala, India
More informationHandwriting segmentation of unconstrained Oriya text
Sādhanā Vol. 31, Part 6, December 2006, pp. 755 769. Printed in India Handwriting segmentation of unconstrained Oriya text N TRIPATHY and U PAL Computer Vision and Pattern Recognition Unit, Indian Statistical
More informationKeyword Spotting in Document Images through Word Shape Coding
2009 10th International Conference on Document Analysis and Recognition Keyword Spotting in Document Images through Word Shape Coding Shuyong Bai, Linlin Li and Chew Lim Tan School of Computing, National
More informationSegmentation of Characters of Devanagari Script Documents
WWJMRD 2017; 3(11): 253-257 www.wwjmrd.com International Journal Peer Reviewed Journal Refereed Journal Indexed Journal UGC Approved Journal Impact Factor MJIF: 4.25 e-issn: 2454-6615 Manpreet Kaur Research
More informationA Brief Study of Feature Extraction and Classification Methods Used for Character Recognition of Brahmi Northern Indian Scripts
25 A Brief Study of Feature Extraction and Classification Methods Used for Character Recognition of Brahmi Northern Indian Scripts Rohit Sachdeva, Asstt. Prof., Computer Science Department, Multani Mal
More informationSEVERAL METHODS OF FEATURE EXTRACTION TO HELP IN OPTICAL CHARACTER RECOGNITION
SEVERAL METHODS OF FEATURE EXTRACTION TO HELP IN OPTICAL CHARACTER RECOGNITION Binod Kumar Prasad * * Bengal College of Engineering and Technology, Durgapur, W.B., India. Rajdeep Kundu 2 2 Bengal College
More informationLocalization, Extraction and Recognition of Text in Telugu Document Images
Localization, Extraction and Recognition of Text in Telugu Document Images Atul Negi Department of CIS University of Hyderabad Hyderabad 500046, India atulcs@uohyd.ernet.in K. Nikhil Shanker Department
More informationA Study to Recognize Printed Gujarati Characters Using Tesseract OCR
A Study to Recognize Printed Gujarati Characters Using Tesseract OCR Milind Kumar Audichya 1, Jatinderkumar R. Saini 2 1, 2 Computer Science, Gujarat Technological University Abstract: Optical Character
More informationHandwritten Devanagari Character Recognition Model Using Neural Network
Handwritten Devanagari Character Recognition Model Using Neural Network Gaurav Jaiswal M.Sc. (Computer Science) Department of Computer Science Banaras Hindu University, Varanasi. India gauravjais88@gmail.com
More informationLine and Word Segmentation Approach for Printed Documents
Line and Word Segmentation Approach for Printed Documents Nallapareddy Priyanka Computer Vision and Pattern Recognition Unit Indian Statistical Institute, 203 B.T. Road, Kolkata-700108, India Srikanta
More informationOne Dim~nsional Representation Of Two Dimensional Information For HMM Based Handwritten Recognition
One Dim~nsional Representation Of Two Dimensional Information For HMM Based Handwritten Recognition Nafiz Arica Dept. of Computer Engineering, Middle East Technical University, Ankara,Turkey nafiz@ceng.metu.edu.
More informationRecognition of Off-Line Handwritten Devnagari Characters Using Quadratic Classifier
Recognition of Off-Line Handwritten Devnagari Characters Using Quadratic Classifier N. Sharma, U. Pal*, F. Kimura**, and S. Pal Computer Vision and Pattern Recognition Unit, Indian Statistical Institute
More informationPart-Based Skew Estimation for Mathematical Expressions
Soma Shiraishi, Yaokai Feng, and Seiichi Uchida shiraishi@human.ait.kyushu-u.ac.jp {fengyk,uchida}@ait.kyushu-u.ac.jp Abstract We propose a novel method for the skew estimation on text images containing
More informationGoing digital Challenge & solutions in a newspaper archiving project. Andrey Lomov ATAPY Software Russia
Going digital Challenge & solutions in a newspaper archiving project Andrey Lomov ATAPY Software Russia Problem Description Poor recognition results caused by low image quality: noise, white holes in characters,
More informationABJAD: AN OFF-LINE ARABIC HANDWRITTEN RECOGNITION SYSTEM
ABJAD: AN OFF-LINE ARABIC HANDWRITTEN RECOGNITION SYSTEM RAMZI AHMED HARATY and HICHAM EL-ZABADANI Lebanese American University P.O. Box 13-5053 Chouran Beirut, Lebanon 1102 2801 Phone: 961 1 867621 ext.
More informationA New Algorithm for Detecting Text Line in Handwritten Documents
A New Algorithm for Detecting Text Line in Handwritten Documents Yi Li 1, Yefeng Zheng 2, David Doermann 1, and Stefan Jaeger 1 1 Laboratory for Language and Media Processing Institute for Advanced Computer
More informationOn Segmentation of Documents in Complex Scripts
On Segmentation of Documents in Complex Scripts K. S. Sesh Kumar, Sukesh Kumar and C. V. Jawahar Centre for Visual Information Technology International Institute of Information Technology, Hyderabad, India
More informationScene Text Detection Using Machine Learning Classifiers
601 Scene Text Detection Using Machine Learning Classifiers Nafla C.N. 1, Sneha K. 2, Divya K.P. 3 1 (Department of CSE, RCET, Akkikkvu, Thrissur) 2 (Department of CSE, RCET, Akkikkvu, Thrissur) 3 (Department
More informationA Generalized Method to Solve Text-Based CAPTCHAs
A Generalized Method to Solve Text-Based CAPTCHAs Jason Ma, Bilal Badaoui, Emile Chamoun December 11, 2009 1 Abstract We present work in progress on the automated solving of text-based CAPTCHAs. Our method
More informationInvariant Recognition of Hand-Drawn Pictograms Using HMMs with a Rotating Feature Extraction
Invariant Recognition of Hand-Drawn Pictograms Using HMMs with a Rotating Feature Extraction Stefan Müller, Gerhard Rigoll, Andreas Kosmala and Denis Mazurenok Department of Computer Science, Faculty of
More informationImage Normalization and Preprocessing for Gujarati Character Recognition
334 Image Normalization and Preprocessing for Gujarati Character Recognition Jayashree Rajesh Prasad Department of Computer Engineering, Sinhgad College of Engineering, University of Pune, Pune, Mahaashtra
More informationKeywords Connected Components, Text-Line Extraction, Trained Dataset.
Volume 4, Issue 11, November 2014 ISSN: 2277 128X International Journal of Advanced Research in Computer Science and Software Engineering Research Paper Available online at: www.ijarcsse.com Language Independent
More informationSlant Correction using Histograms
Slant Correction using Histograms Frank de Zeeuw Bachelor s Thesis in Artificial Intelligence Supervised by Axel Brink & Tijn van der Zant July 12, 2006 Abstract Slant is one of the characteristics that
More informationA New Approach to Detect and Extract Characters from Off-Line Printed Images and Text
Available online at www.sciencedirect.com Procedia Computer Science 17 (2013 ) 434 440 Information Technology and Quantitative Management (ITQM2013) A New Approach to Detect and Extract Characters from
More informationData Hiding in Binary Text Documents 1. Q. Mei, E. K. Wong, and N. Memon
Data Hiding in Binary Text Documents 1 Q. Mei, E. K. Wong, and N. Memon Department of Computer and Information Science Polytechnic University 5 Metrotech Center, Brooklyn, NY 11201 ABSTRACT With the proliferation
More informationText Enhancement with Asymmetric Filter for Video OCR. Datong Chen, Kim Shearer and Hervé Bourlard
Text Enhancement with Asymmetric Filter for Video OCR Datong Chen, Kim Shearer and Hervé Bourlard Dalle Molle Institute for Perceptual Artificial Intelligence Rue du Simplon 4 1920 Martigny, Switzerland
More informationVision. OCR and OCV Application Guide OCR and OCV Application Guide 1/14
Vision OCR and OCV Application Guide 1.00 OCR and OCV Application Guide 1/14 General considerations on OCR Encoded information into text and codes can be automatically extracted through a 2D imager device.
More informationInput sensitive thresholding for ancient Hebrew manuscript
Pattern Recognition Letters 26 (2005) 1168 1173 www.elsevier.com/locate/patrec Input sensitive thresholding for ancient Hebrew manuscript Itay Bar-Yosef * Department of Computer Science, Ben Gurion University,
More informationSkew Angle Detection of Bangla Script using Radon Transform
Sew Angle Detection of Bangla Script using Radon Transform S. M. Murtoza Habib, Nawsher Ahamed Noor and Mumit Khan Center for Research on Bangla Language Processing, BRAC University, Dhaa, Bangladesh.
More informationTime Stamp Detection and Recognition in Video Frames
Time Stamp Detection and Recognition in Video Frames Nongluk Covavisaruch and Chetsada Saengpanit Department of Computer Engineering, Chulalongkorn University, Bangkok 10330, Thailand E-mail: nongluk.c@chula.ac.th
More informationCharacter Segmentation for Telugu Image Document using Multiple Histogram Projections
Global Journal of Computer Science and Technology Graphics & Vision Volume 13 Issue 5 Version 1.0 Year 2013 Type: Double Blind Peer Reviewed International Research Journal Publisher: Global Journals Inc.
More informationLigature-based font size independent OCR for Noori Nastalique writing style
Ligature-based font size independent OCR for Noori Nastalique writing style Qurat ul Ain Akram Sarmad Hussain Center for Language Engineering, Al-Khawarizmi Institute of Computer Science University of
More informationCHAPTER 8 COMPOUND CHARACTER RECOGNITION USING VARIOUS MODELS
CHAPTER 8 COMPOUND CHARACTER RECOGNITION USING VARIOUS MODELS 8.1 Introduction The recognition systems developed so far were for simple characters comprising of consonants and vowels. But there is one
More informationSegmentation of Images
Segmentation of Images SEGMENTATION If an image has been preprocessed appropriately to remove noise and artifacts, segmentation is often the key step in interpreting the image. Image segmentation is a
More informationRecognition-based Segmentation of Nom Characters from Body Text Regions of Stele Images Using Area Voronoi Diagram
Author manuscript, published in "International Conference on Computer Analysis of Images and Patterns - CAIP'2009 5702 (2009) 205-212" DOI : 10.1007/978-3-642-03767-2 Recognition-based Segmentation of
More informationSkew Detection for Complex Document Images Using Fuzzy Runlength
Skew Detection for Complex Document Images Using Fuzzy Runlength Zhixin Shi and Venu Govindaraju Center of Excellence for Document Analysis and Recognition(CEDAR) State University of New York at Buffalo,
More informationScanner Parameter Estimation Using Bilevel Scans of Star Charts
Boise State University ScholarWorks Electrical and Computer Engineering Faculty Publications and Presentations Department of Electrical and Computer Engineering 1-1-2001 Scanner Parameter Estimation Using
More informationText Extraction from Natural Scene Images and Conversion to Audio in Smart Phone Applications
Text Extraction from Natural Scene Images and Conversion to Audio in Smart Phone Applications M. Prabaharan 1, K. Radha 2 M.E Student, Department of Computer Science and Engineering, Muthayammal Engineering
More informationAn Accurate Method for Skew Determination in Document Images
DICTA00: Digital Image Computing Techniques and Applications, 1 January 00, Melbourne, Australia. An Accurate Method for Skew Determination in Document Images S. Lowther, V. Chandran and S. Sridharan Research
More informationA Statistical approach to line segmentation in handwritten documents
A Statistical approach to line segmentation in handwritten documents Manivannan Arivazhagan, Harish Srinivasan and Sargur Srihari Center of Excellence for Document Analysis and Recognition (CEDAR) University
More informationWord Slant Estimation using Non-Horizontal Character Parts and Core-Region Information
2012 10th IAPR International Workshop on Document Analysis Systems Word Slant using Non-Horizontal Character Parts and Core-Region Information A. Papandreou and B. Gatos Computational Intelligence Laboratory,
More informationA new approach to reference point location in fingerprint recognition
A new approach to reference point location in fingerprint recognition Piotr Porwik a) and Lukasz Wieclaw b) Institute of Informatics, Silesian University 41 200 Sosnowiec ul. Bedzinska 39, Poland a) porwik@us.edu.pl
More informationRough Set Approach to Unsupervised Neural Network based Pattern Classifier
Rough Set Approach to Unsupervised Neural based Pattern Classifier Ashwin Kothari, Member IAENG, Avinash Keskar, Shreesha Srinath, and Rakesh Chalsani Abstract Early Convergence, input feature space with
More informationHandwritten Devanagari Character Recognition
Handwritten Devanagari Character Recognition Akhil Deshmukh, Rahul Meshram, Sachin Kendre, Kunal Shah Department of Computer Engineering Sinhgad Institute of Technology (SIT) Lonavala University of Pune,
More informationHCR Using K-Means Clustering Algorithm
HCR Using K-Means Clustering Algorithm Meha Mathur 1, Anil Saroliya 2 Amity School of Engineering & Technology Amity University Rajasthan, India Abstract: Hindi is a national language of India, there are
More informationDATABASE DEVELOPMENT OF HISTORICAL DOCUMENTS: SKEW DETECTION AND CORRECTION
DATABASE DEVELOPMENT OF HISTORICAL DOCUMENTS: SKEW DETECTION AND CORRECTION S P Sachin 1, Banumathi K L 2, Vanitha R 3 1 UG, Student of Department of ECE, BIET, Davangere, (India) 2,3 Assistant Professor,
More informationRecognition of Gurmukhi Text from Sign Board Images Captured from Mobile Camera
International Journal of Information & Computation Technology. ISSN 0974-2239 Volume 4, Number 17 (2014), pp. 1839-1845 International Research Publications House http://www. irphouse.com Recognition of
More informationTEXT DETECTION AND RECOGNITION FROM IMAGES OF NATURAL SCENE
TEXT DETECTION AND RECOGNITION FROM IMAGES OF NATURAL SCENE Arti A. Gawade and R. V. Dagade Department of Computer Engineering MMCOE Savitraibai Phule Pune University, India ABSTRACT: Devanagari is the
More informationN.Priya. Keywords Compass mask, Threshold, Morphological Operators, Statistical Measures, Text extraction
Volume, Issue 8, August ISSN: 77 8X International Journal of Advanced Research in Computer Science and Software Engineering Research Paper Available online at: www.ijarcsse.com A Combined Edge-Based Text
More informationDevanagari Handwriting Recognition and Editing Using Neural Network
Devanagari Handwriting Recognition and Editing Using Neural Network Sohan Lal Sahu RSR Rungta College of Engineering & Technology (RSR-RCET), Bhilai 490024 Abstract- Character recognition plays an important
More informationLECTURE 6 TEXT PROCESSING
SCIENTIFIC DATA COMPUTING 1 MTAT.08.042 LECTURE 6 TEXT PROCESSING Prepared by: Amnir Hadachi Institute of Computer Science, University of Tartu amnir.hadachi@ut.ee OUTLINE Aims Character Typology OCR systems
More informationSinhala Handwriting Recognition Mechanism Using Zone Based Feature Extraction
Sinhala Handwriting Recognition Mechanism Using Zone Based Feature Extraction 10 K.A.K.N.D. Dharmapala 1, W.P.M.V. Wijesooriya 2, C.P. Chandrasekara 3, U.K.A.U. Rathnapriya 4, L. Ranathunga 5 Department
More informationPixels. Orientation π. θ π/2 φ. x (i) A (i, j) height. (x, y) y(j)
4th International Conf. on Document Analysis and Recognition, pp.142-146, Ulm, Germany, August 18-20, 1997 Skew and Slant Correction for Document Images Using Gradient Direction Changming Sun Λ CSIRO Math.
More informationStructural and Syntactic Techniques for Recognition of Ethiopic Characters
Structural and Syntactic Techniques for Recognition of Ethiopic Characters Yaregal Assabie and Josef Bigun School of Information Science, Computer and Electrical Engineering Halmstad University, SE-301
More informationCursive Handwriting Recognition System Using Feature Extraction and Artificial Neural Network
Cursive Handwriting Recognition System Using Feature Extraction and Artificial Neural Network Utkarsh Dwivedi 1, Pranjal Rajput 2, Manish Kumar Sharma 3 1UG Scholar, Dept. of CSE, GCET, Greater Noida,
More informationHandwritten Marathi Character Recognition on an Android Device
Handwritten Marathi Character Recognition on an Android Device Tanvi Zunjarrao 1, Uday Joshi 2 1MTech Student, Computer Engineering, KJ Somaiya College of Engineering,Vidyavihar,India 2Associate Professor,
More informationUniversal Graphical User Interface for Online Handwritten Character Recognition
Universal Graphical User Interface for Online Handwritten Character Recognition Thesis submitted in partial fulfillment of the requirements for the award of degree of Master of Engineering in Computer
More informationHilditch s Algorithm Based Tamil Character Recognition
Hilditch s Algorithm Based Tamil Character Recognition V. Karthikeyan Department of ECE, SVS College of Engineering Coimbatore, India, Karthick77keyan@gmail.com Abstract-Character identification plays
More information