NASTAALIGH HANDWRITTEN WORD RECOGNITION USING A CONTINUOUS-DENSITY VARIABLE-DURATION HMM

Size: px
Start display at page:

Download "NASTAALIGH HANDWRITTEN WORD RECOGNITION USING A CONTINUOUS-DENSITY VARIABLE-DURATION HMM"

Transcription

1 NASTAALIGH HANDWRITTEN WORD RECOGNITION USING A CONTINUOUS-DENSITY VARIABLE-DURATION HMM Reza Safabakhsh and Peyman Adibi Computational Vision/Intelligence Laboratory Computer Engineering Department, Amirkabir University of Technology Hafez Avenue, Tehran, Iran الخلاصة سوف نقدم ف ي ه ذا البح ث نظام ا آ ام لا للتع رف عل ى آلم ات (نس تعليق الخطي ة الفارس ية) باس تخدام موديل مارآوف الخف ي وتك اثف المش اهدات المس تمرة وط ول الح الات المتغي رة.(CDVDHMM) وف ي مرحل ة التق ديم المقدم ة بع د عملي ات ب اينري وإلغ اء الن ويز والحص ول عل ى الا ج زاء المتص لة ي تم اس تخدام خوارزمية جديدة لكشف الصاعد والهابط والنقاط وساي ر الا جزاء الثانوية وشطبها من التصوير الري يسي. ثم يتم تنفيذ خوارزمية تقطيع جديدة علي أساس تحليل آانتور العلوي وعمليتين مساعديتين. والغرض م ن ه ذه الخوارزمية هو أن لاتكون هناك قدر الا مكان مشكلة عدم التقطيع. وقد تم تخصيص طول الحالات المتغي رة لا زال ة التقطي ع الزاي د. وبع د الحص ول عل ى الترتي ب م ن اليم ين إل ى اليس ار ي تم إج راء مودي ل CDVDHMM بمو خر الحروف التحتية الناتجة. والخصاي ص الثمانية التي تشتمل على تواص يف فوري ة الثلاث ة وع دد الخص اي ص الا ساس ية الت ي تس تخدم لع رض ه ذه الرم وز ف ي أج واء الخص اي ص. والبع د الخصاي صي بالنسبة لتغيير القياس غير المتغي ر. إن الح الات ف ي ه ذا النم وذج تش تمل عل ى أح رف خالص ة (ب دون أج زاء ثانوي ة) وع دة أش كال ترآيبي ة ف ي أس لوب الكتاب ة بالنس تعليق. وعلي ه ف ا ن تعل يم المودي ل ي تم بس هولة ودون الحاج ة لا س لوب التق دير الث انوي. وف ي مرحل ة التعل يم ي تم الحص ول عل ى ع دد م ن مكون ات الموديل من مجموعة التصاوير التعليمية والباقي من القاموس وبالتالي يتم الحصول على نسخة خوارزمية ويتربي المعد لة للتعريف. وتعطينا هذه الخوارزمية أفضل صورة لا آث ر م ن مس ار ع ام ال ذي يا خ ذ المواق ع و لا ا شكال المختلفة للا حرف آحالات تحتية ويساند ط ول الح الات المتغي رة. وق د تبي ن أن الاختب ارات الت ي اجريت على نماذج خطية وقاموس ذات 50 آلمة قدمت نتاي ج جيدة للاسلوب المستخدم. الكلم ات الري يس ية: تعري ف الكلم ات خط ي تقطي ع فارس ي عرب ي نس تعليق مس تمر مودي ل مارآوف الخفي. To whom correspondence should be addressed. Fax : , (safa@ce.aut.ac.ir) (adibi@ce.aut.ac.ir) April 2005 The Arabian Journal for Science and Engineering, Volume 30, Number 1B. 95

2 ABSTRACT This paper introduces a complete system for recognition of Farsi Nastaaligh handwritten words using a continuous-density variable-duration hidden Markov model, CDVDHMM [1]. In preprocessing stage, after binarization, noise reduction, and connected component specification, new algorithms are applied to find and eliminate ascenders, descenders, dots, and other secondary strokes from the original image. Then a new segmentation algorithm based on analyzing upper contour and two other processes is applied. The main goal of this algorithm is to avoid the undersegmentation problem. Considering variable duration states in the system allows covering the over-segmentation problem. By finding the right-to-left order, the sequence of obtained sub-characters is modeled by the CDVDHMM. Eight features, including three Fourier descriptors and five structural and discrete features, are applied to represent symbols in the feature space. This feature vector is invariant to size and shift. The states in the model are considered as pure characters (without secondary strokes) plus some compound forms of characters in Nastaaligh handwriting style. Thus, training the model becomes simple and does not need any re-estimation method. In the training stage, some parameters of the model are obtained from the training image set and the others from the dictionary. At the last stage, a modified version of Viterbi algorithm is applied for recognition. This algorithm provides more than one globally best path and considers different positions and forms of letters as sub-states and also supports variable duration states. Experiments on handwritten samples and a 50-word dictionary show very good performance of the system. Key Words: OCR, handwritten, word recognition, segmentation, Farsi, Arabic, Nastaaligh, cursive, HMM, CDVDHMM. 96 The Arabian Journal for Science and Engineering, Volume 30, Number 1 B. April 2005

3 NASTAALIGH HANDWRITTEN WORD RECOGNITION USING A CONTINUOUS-DENSITY VARIABLE-DURATION HMM 1. INTRODUCTION Off-line recognition of handwritten text has many applications in bank check processing, postal address and zip code recognition, and automated handwritten document entry and understanding. As a result, research interest is increasing in this field and some progress has been made. However, the performance of even the best handwritten text recognition systems is as yet far from human reading ability. Many papers have been concerned with the recognition of Latin, Japanese, and Chinese characters in recent years. But although almost one third of the people in the world use Arabic and Farsi characters for writing, little and sparse efforts for the automated recognition of these characters have been made. This is probably the result of a lack of adequate support in terms of funding, and other utilities, such as comprehensive and standard Arabic or Farsi text databases, dictionaries, etc; and certainly, of the cursive nature of writing in these languages [2]. More details on the state of the art in Arabic character recognition is presented in [2]. An important aspect in classification of character recognition systems is the existence of and the used method of segmentation in them. The concept and various methods of segmentation are reviewed in [3]. Three basic strategies for segmentation are proposed there, such that each segmentation method can be considered as a weighted combination of these three strategies. These strategies are as follows: (1) classic strategy, that attempts to dissect images to classifiable units; (2) recognition-based segmentation strategy, that looks for components of image which match to classes of system s alphabet and decides about segmentation using a feedback from the recognition stage; and (3) holistic strategy, that tries to recognize a word as a whole. The holistic methods have the advantage that the difficult dissection stage is not required in them, but their drawback is that the number of words for which the system is designed is limited and cannot be too many. On the other hand, the classic and recognition-based methods are more powerful and not limited in the number of words which they can recognize. In this paper, we have developed a system for off-line recognition of Nastaaligh handwritten words which uses a recognition-based segmentation method and applies a continuous-density variable-duration hidden Markov model for the recognition task. In Section 2, characteristics of Farsi and Arabic writings are briefly described. In Section 3, the hidden Markov model (HMM) and several word recognition systems based on HMM are discussed. Section 4 describes the operational stages of the overall system. In Section 5, experimental results for each stage of the method are presented. Section 6 concludes the paper. 2. CHARACTERISTICS OF FARSI/ARABIC CURSIVE SCRIPT Farsi/Arabic scripts are different from Latin scripts in several ways: (1) the shape of a Farsi/Arabic letter is a function of position of that letter in the word. For each letter, there may be up to four different shapes based on the letter position in the word, which are called first, middle, last, and isolated forms of the letter (Table 1). In addition, for some Farsi writing styles, there is more than one shape for a letter in a fixed position. (2) Farsi and Arabic writings are naturally cursive. Nevertheless, some characters never connect to the next letter in the word. Because of this, a word can have more than one cursive part. These cursive parts are here referred to as sub-words. (3) Farsi and Arabic scripts have various styles. Also, each writing style can contain new and compound forms of letters. Thus, if we consider, for example, an unconstrained handwritten Farsi text, the number of separate classes that must be considered will be too many. This makes the recognition process very difficult. (4) Farsi and Arabic characters can have zero, one, two, or three dots over or under them; and sometimes, the only difference between two characters is the existence or the number of these dots. 5) Farsi and Arabic text, in contrast to Latin texts, is written from right to left. A list of Farsi characters and different forms of them is presented in Table 1. The arabic alphabet is identical to the Farsi alphabet, except that Farsi has four more characters (these four characters are underlined in Table 1). Characters that are similar except for their dots or other secondary strokes can be considered as one family. For example, the family of character Be contains characters of rows 2 to 5 of Table 1. April 2005 The Arabian Journal for Science and Engineering, Volume 30, Number 1B. 97

4 Table 1. Farsi character set and shapes of each character in different positions. Character Isolated First Middle Last Alef Be Pe Te Se Jim Che He Khe Dal Zal Re Ze Zhe Sin Shin Sad Zad Ta Za Ayn Ghayn Fe Ghaf Kaf Gaf Lam Mim Noon Waw He Ye ا ب پ ت ث ج چ ح خ د ذ ر ز ژ س ش ص ض ط ظ ع غ ف ق ك گ ل م ن و ه ي ا ب پ ت ث ج چ ح خ د ذ ر ز ژ س ش ص ض ط ظ ع غ ف ق ك گ ل م ن و ه ي ا( ا ( ب پ ت ث ج چ ح خ د ذ ر ز ژ س ش ص ض ط ظ ع غ ف ق آ گ ل م ن و ه ي ا( ا ( ب پ ت ث ج چ ح خ د ذ ر ز ژ س ش ص ض ط ظ ع غ ف ق ك گ ل م ن و ه ي Different scripts that use the Farsi alphabet can be divided into eight groups, and each group can include different styles [4]. Some of these styles were more common in the past and some are so in the present. Most of today s handwritten Farsi texts are written in Nastaaligh and Naskh styles; and the Nastaaligh style, due to its special beauty, is the most popular and favorite writing style among most writers. As a result, the Nastaaligh writing style is considered in this paper. The Nastaaligh style, however, despite its wide popularity, is a difficult style and has numerous rules and exceptions, which make its automatic recognition very hard. Some examples that show such rules and the difficulties of the work are presented below. 98 The Arabian Journal for Science and Engineering, Volume 30, Number 1 B. April 2005

5 The family of character Be (ب) (rows 2 to 5 of Table 1), depending on the letter following them, are written in different forms: The family of character He (ح) (rows 6 to 9 of Table 1), depending on the letter following them, appear in different forms: The family of character Kaf (آ) (rows 25 to 26 of Table 1), depending on their position in the word, appear in different forms: The character Mim,(م) is written in two different forms, rectangular and circular: The character He (ه) (row 31 of Table 1), depending on its position in the word, can appear in different forms: Cogs in the words, sometimes appear as curves: The distance between a character and the middle baseline or the vertical position of the character, depending on other characters of the word, can be different: Some characters and compound forms rest on the baseline while others do not. In fact, some parts of words are written in an angle about 30 degrees to the baseline: Even if we ignore the problems arising from multiple shapes of some characters, the above last two rules alone are sufficient to indicate the difficulty of segmenting Nastaaligh Handwritten words. April 2005 The Arabian Journal for Science and Engineering, Volume 30, Number 1B. 99

6 In order to create appropriate training and testing sets, we must select a minimal set of words which include various features of the handwriting style. We assume that the training and testing sets are constrained only to the Nastaaligh style to reduce the number of patterns that must be recognized. Also, the words included in these sets are carefully selected in such a manner that they contain all characters, character shapes, and compound forms of characters in Nastaaligh style. Figure 1 shows some samples of the words used and Figure 2 illustrates four compound forms of characters in Nastaaligh style which are included in our system. More details about the selected training and testing sets are presented in Section 5.1. تخصيص باج گيرها هدهد آاآل تسريع مهمات Figure 1. Some samples of handwritten words in the test set. : ك -م-ا or ك -م-ل or ك -م-ك letters 1. Compound form made by composition of three آمك: اآمل: آمال: : ك -ك and, ك -ا, ك -ل compositions 4. The : ه -ا composition 3. The : ط -ه composition 2. The آاآل: آك: بعدها: ظهر: In the above cases, we can also consider گ and ظ in place of ك and ط, respectively. Figure 2. Four compound forms in the Nastaaligh Farsi writing style. 3. TEXT RECOGNITION BASED ON HIDDEN MARKOV MODELS 3.1. Introduction to HMM Hidden Markov models are based on doubly stochastic processes whose underlying random process is not directly observable (i.e. it is hidden). The transition of the system from the current state to the next state is done based on this underlying process. Observable outputs or observations are produced by another stochastic process, which is determined by symbol probabilities. A hidden Markov model with discrete observation symbols, is represented by λ = ( A, B, Π), where A is the state transition probabilities matrix, B is the discrete probability distributions of observation symbols, and Π is the probability of initial states [5]. In some applications, the distribution of observations is considered continuous and duration probability of states is considered in explicit form. For example in [1] a continuous density variable duration HMM (CDVDHMM) is applied. This model is represented by λ = ( Π, A, Γ, B, D) whose parameters are: Initial probability: Π = { π i}; π i = Pr{ i 1 = qi}, i 1 = First State (1) Transition probability: A = { aij}; aij = Pr{ q j at t + 1 qi at t } (2) Last-state probability: Γ = γ }; γ = Pr{ i = q }, i = Last State (3) { i i T i T 100 The Arabian Journal for Science and Engineering, Volume 30, Number 1 B. April 2005

7 t+ d t+ d B { j t t t t+ 1 t+ d Symbol probability: = b ( O )}; O = ( o o... o ), b O) = Pr{ O q } (4) j ( j Duration probability: D = { P( d q )}; P( d q ) = Pr{ duration( q ) d} (5) i i i = In the study of HMM s, there are three basic problems: (1) Given an observation sequence O={ o 1,, o T } and a model λ, how do we find P ( O λ) effectively? This is the scoring problem. (2) Given an observation sequence O and a model λ, how can we find the state sequence q = { q 1,..., q T } for O such that it is optimal in some sense (i.e. better explains observation sequence)? This is the recognition problem. (3) How can we find the parameters of the model which maximize P ( O λ)? This is the training problem. The solution to problem (1) is the forward or backward process. For problem (2), the most common optimization criteria is finding an optimal state sequence (an optimal path). The Viterbi algorithm yields such a sequence. To solve problem (3), one can apply the Baum Welch re-estimation algorithm. The reader is referred to [5] for more details about these solutions Text Recognition with HMM In word recognition problems, there are two main approaches to model the observation sequence (pseudocharacters) by HMM [6]. The first approach is called model discriminant HMM. In this strategy, for each class of the problem (each word in lexicon) a model is constructed. Then for recognizing an input word, the score for matching the word to each model is computed, and the class related to the model that has the maximum score gives the result of the recognition. This approach is reasonable for small dictionary sizes, say up to several hundred words. But when the size of dictionary grows to about 1000 words or larger, this approach will have excessive complexity in terms of computation and memory. The second approach, called path discriminant HMM, is to build only one model for all classes and use different paths (state sequences) to distinguish one pattern from the others. A test pattern is classified into the class which has the maximum path probability over all possible paths. This approach is a better alternative for a large or variable dictionary. Some researchers have applied path discriminant methods for Latin handwritten word recognition [1,6 8]. In [7], a second order HMM is also tested which for long words has shown a better performance in comparison to the first order model. In [6], the inputs to the system are assumed to be unconstrained handwritten words. In this system, the number of states may become too large and so the speed and precision of the system can decrease. In [1], this problem is removed by considering variable durations for states and using an over-segmentation method, which does not leave any two letters unsegmented. In addition, by considering continuous density for observation probabilities, the performance of the system is improved. The problem with this system is unreliable training of state duration probabilities in limited training databases. To remove this problem, a system is proposed in [8] whose operation is independent of state duration probabilities. In [8 12], model discriminant approaches are used. In these systems, a left-to-right (for Latin characters) or right-toleft (for Arabic characters) HMM is considered for each character, and the word model is obtained by concatenation of these HMM s. In [13] and [14], 2-D hidden Markov models are applied for recognition of printed Arabic words, and in [15], the model is used to improve the performance of recognition of Farsi printed sub-words. 4. THE RECOGNITION SYSTEM In this paper, a system for off-line recognition of Farsi Nastaaligh handwritten words is presented. The stages of the system are illustrated in the block diagram of Figure 3. First, the necessary preprocessing algorithms are applied to the image of the word. Then, the word is dissected to its letters or pseudo letters, and a set of features is extracted from the image of each segment or combination of adjacent segments, and recognition is done based on the classification of these feature vectors. In the recognition stage, the model, which is trained before, recognizes the word using these feature vectors. Since single segments and combinations of adjacent segments are examined for finding optimal letters, we can classify the segmentation method used in our system as a recognition-based method. Because of successful applications of hidden Markov models (HMMs) in word recognition systems [1, 6, 16, 17], an HMM-based method is selected for the recognition stage of the system. The applied model is a continuous-density variable-duration HMM [1]. The recognition combining adjacent segments feature extraction cycle, which is used by the Viterbi algorithm for HMM, is the key factor in optimal determination of words in this system. April 2005 The Arabian Journal for Science and Engineering, Volume 30, Number 1B. 101

8 Images of combined segments Combining adjacent segments Input word image Preprocessing Preprocessed image Segmentation Images of segments Feature extraction Feature vectors Recognition Recognized word Figure 3. Block diagram of the system 4.1. Preprocessing In preprocessing stage, the input image is first binarized by means of the iterative threshold selection method [18]. Then the two morphological operations closing with 3 3 and opening with 2 2 structural elements are applied to the image respectively to eliminate spiked noise [1]. Then, connected components are found by an algorithm which starts from the top row of the image and builds bounding rectangles. By adding consequent rows, the height and width of these rectangles are modified such that when we arrive at the bottom of the image, final bounding rectangles, i.e. connected components, are obtained [19]. The pen width is estimated by an algorithm in which the mean value of the vertical run length is computed in each column, periodically. Then, those run lengths larger than 1.5 times this mean value will not enter in computation of the mean value in consequent iterations [6]. For handwritten Farsi/Arabic words, and specially Nastaaligh writing style, baseline detection is a difficult and unreliable process. Sometimes, for Nastaaligh style, more than one or one slanted baseline must be considered. Thus, information provided by the horizontal histograms of words may not be sufficient for baseline detection. Some other methods for baseline detection are also proposed (e.g. in [20]); but due to their high complexity, they must be used only when necessary. As a result, we design our system such that it works independent of the baseline. In our system, since the model receives a sequence of segments, the right-to-left order of segments must be specified after segmentation. In the absence of the baseline, the ascenders of characters Kaf and Gaf (Figure 4(b)) and descenders of characters Jim, Che, He, Khe, Ayn, and Ghayn (Figure 4(a)), cause incorrect determination of the right-to-left order. Thus, it is desired to eliminate problematic ascenders and descenders in preprocessing stage. This elimination also eliminates the incorrect segmentation of them, and therefore decreases the segmentation errors. Furthermore, since the recognition process is done based on the pure body of characters, elimination of dots and other secondary strokes from the image is required in the preprocessing stage. These secondary strokes are processed in a post-processing stage and the recognition task becomes complete. To eliminate ascenders, descenders, and dots, several new algorithms are proposed that will be explained in the following sections. (a) بكمكش: تسريع: (b) Figure 4. Problems arise from descenders and ascenders: (a) Ayn will be considered before Ye (b) Kaf will be considered before Be Ascender Elimination Characters Kaf and Gaf have ascenders that should be eliminated. Some characteristics that discriminate ascenders from other strokes in a word are their almost 45-degree slope, straightness, and relative large length. These three features form the basis for the ascender detection and elimination algorithm. Character Kaf has only one long ascender while Gaf has one long and one short ascender. The algorithm for detection and elimination of ascenders is shown in Figure The Arabian Journal for Science and Engineering, Volume 30, Number 1 B. April 2005

9 1. For each valid connected component (CC) do: 1.1. Find the top-most point of the lower contour and call it SP (Starting Point). Let k=k1= While stop condition is not true, starting from SP, do: Traverse the lower contour downward by going to the next point of lower contour. k=k If (current move is in left-down direction): k1=k If (a0xpw > k > a1xpw AND k1/k > b1): a short ascender is detected. Mark it If ( k > a0xpw AND k1/k > b0 ): a long ascender is detected Mark it, and eliminate its lower contour points from the lower contour of this CC goto step 1.1 to search for other probably existing ascenders in this CC. 2. For each detected ascender in step 1 do: 2.1. Check validity conditions If (this ascender is valid): Eliminate it from the image by filling it with color ASCENDER_COLOR. Figure 5. Ascender detection and elimination algorithm. In this algorithm, PW is the estimated pen width. The parameter values showing the best results in experiments are 4.95, 1.4, 0.58, and 0.33 for a0, a1, b0, and b1, respectively. The lower contour for each connected component is found from the chain code of the outer contour obtained during finding connected components. East, North-East, and South- East directions in the chain code of the outer contour represent the lower contour. In step 1.2 of the algorithm, the stop condition becomes true if one of these situations occur while the lower contour is traversed: (i) Movements in the left or left-up directions or a combination of these two directions continue in more than 3, 1, and 3 pixels, respectively. (ii) A jump with a displacement of more than 1 pixel upward or to the right, or more than 4 pixels downward, or more than 3 pixels to the left direction. (iii) Reaching the last point of the lower contour. In step 2 a long ascender is considered to be invalid if it has a long overlap with other strokes near the head of ascender or there is a relatively large change in the stroke width near its head, or there exists a downward vertical part at the head of the ascender. A short ascender is valid if it approximately covers all the space of its connected component and is higher upper than and very close to a long ascender. Figure 6 illustrates the application of the algorithm to a word having two long and one short ascenders. Figure 6(b) shows the lower contour of the word in Figure 6 (a). At first, the lower contour is traversed from SP1 to EP1, where EP1 is the last pixel in the lower contour of this connected component. Variables k and k1 satisfy the conditions of a long ascender. Thus SP2 is selected as the new starting point. At EP2 a long jump to the right terminates the loop and another long ascender is detected. Again starting from SP3, the last point of current connected component, i.e. EP3, is reached and the values of k and k1 denote a short ascender. Validity conditions are true for all these ascenders and they are eliminated from the image as shown in Figure 6(c). (a) (b) (c) Figure 6. (a) A word with characters Kaf and Gaf. (b) Three ascenders are detected and illustrated with their starting points (SP) and end points (EP). (c) All detected ascenders are valid and are eliminated from the image. In Figure 7, two samples of operation of this algorithm are shown. Figure 7(a) shows a successful elimination of the ascenders of the two Kaf letters, while Figure 7(b) illustrates a mistake of the algorithm. In this figure, character Te, because of its 45 degrees slope and relatively large length, is also eliminated incorrectly as an ascender. April 2005 The Arabian Journal for Science and Engineering, Volume 30, Number 1B. 103

10 (a) Figure 7. Operation of ascender elimination algorithm: (a) correct operation; (b) incorrect operation Descender Elimination The algorithm which detects and eliminates the descenders of characters Jim, Che, He, Khe, Ayn, and Ghayn is shown in Figure 8. These descenders cause the incorrect determination of right to left order. This algorithm works as follows. In the image, it starts from the right-most column which contains black pixels and selects the most-bottom black run in it, and follows this black run column-by-column toward left until this run joins another black run (step 1.1.3). In step this detected descender is considered to be valid if the following conditions are true: (i) The overlap length of this stroke with upper strokes (UpRunLen) is relatively large (more than 2.5 PW, where PW is the estimated pen width). (ii) The length of this stroke is relatively large (more than 2.5xPW). (iii) At least in one column, there are more than two black runs. 1. For each valid connected component (CC) do: 1.1. For each column of current CC starting from right column do: Find black runs in the current column and let the lowest black run to be current run (CurRun). UpRunLen= If (number of black runs is greater than 1 AND current column is not the most right column): UpRunLen= UpRunLen If (number of black runs which are adjacent to CurRun is more than 1): the current CC does not contain any descender. go to the next CC. Else: let the black run which is adjacent to CurRun as CurRun If (number of black runs in previous column which are adjacent to CurRun is more than 1): a descender is detected: Check validity conditions If (this descender is valid): Eliminate it from the image by filling it with color DESCENDER_COLOR. Figure 8. Descender detection and elimination algorithm. These characteristics discriminate descenders from other strokes properly. Fig. 9(a) shows a typical result obtained by this algorithm. Fig. 9(b) and 9(c), respectively, show the situations that make the conditions in steps and true. (b) (a) (b) (c) Figure 9. (a) Operation of descender elimination algorithm. (b) Situation which satisfies condition in step of the algorithm. (c) Situation which satisfies condition in step of the algorithm. 104 The Arabian Journal for Science and Engineering, Volume 30, Number 1 B. April 2005

11 Secondary Strokes Elimination To detect the secondary strokes, some of their characteristics such as small size and containment in a larger subword can be considered. We consider a connected component as a secondary stroke and eliminate it from the image if its width is less than 2.5 times and its height is less than 5 times the estimated pen width (or vice versa) and it overlaps, at least in 25 percent of its width, with a larger component. The above thresholds are determined such that no other strokes are incorrectly eliminated as secondary strokes. Thus the algorithm retains some secondary strokes that are written moderately large or are distant from character body. To enhance the performance of this algorithm, a primary classifier can be used in the preprocessing stage which can discriminate secondary strokes from letters that have nearly the same size as them (such as single forms of letters Alef, Dal, Zal, Re, Waw, and He ) [21]. Since the number of classes is smaller in this case, the features extracted from the image can be simpler and can be optimized for recognition Segmentation and Determination of the Right-to-Left Order The objective of the segmentation stage here is to achieve an over-segmentation such that each pair of connected characters are split. Then characters can be considered as states in the recognition stage [1]. When a character is segmented to more than one segment, variable duration of HMM states considered in this system, covers this problem. After segmentation, right-to-left order of segments must be found to use in recognition stage. We studied the existing word segmentation techniques and their ability to satisfy the mentioned criteria. The methods that are based on vertical histogram or baseline [22 25] are not suitable for handwritten words, specially for Nastaaligh style, because of various vertical overlaps and horizontal slants that exist in this style. Furthermore, methods that use vertical width of strokes [26] do not seem to be very appropriate for moderately free handwritten scripts. We will propose two enhanced methods which are more suitable for Nastaaligh word segmentation. The first method works based on the idea of regular and singular components, and considers the regular components as candidates for segmentation. The second method works based on analysis of upper contour of the words. In next sections, these segmentation methods and the technique used for finding right-to-left order of segments will be explained Segmentation using Regular and Singular Components Segmentation based on regular and singular components (or regularities and singularities) is proposed in [27], [6], and [1] for Latin handwritten and in [19] for Arabic handwritten words. We have implemented a segmentation method, ط, ص based on the same idea. In this method, first the holes in the preprocessed image (such as loops in characters etc.) are filled to avoid segmentation in these loops. Then, an opening operation is performed on the image with a, ه vertical structural element, whose height is a little (one or two pixels) larger than the estimated pen width. In this way, the moderately vertical parts of the image are obtained. Then a closing operation with a horizontal structural element having a small width (about three to five pixels) is performed to join together the vertical parts (resulting from the previous operation) that are close to each other. The results of this operation are called singularities or islands. By subtracting these components from the original image, regularities or bridges are found. At this point, some characters may have too many regularities which will cause an unacceptable over-segmentation. To decrease this problem, those regularities with a width smaller than a threshold (e.g. estimated pen width) are eliminated; i.e., they are added to singularities. Then among the remaining regularities (bridges), those that do not join two singularities (two islands) are also eliminated, i.e. are added to singularities (these regularities are the starting or the ending components of the sub-words). Finally, segmentation is performed at the middle of the final regularities. The following parameters are effective in the performance of this algorithm: The height of the structuring element for the opening operation: larger values for this parameter reduce the number of regularities, and thus increase over-segmentation and decrease under-segmentation. Since our goal here is to decrease under-segmentation, a value equal to the estimated pen width plus two is selected for this parameter. This value has experimentally shown better results. The width of the structuring element for the closing operation: smaller values for this parameter results in more over-segmentation and less under-segmentation occurrences. In [1] the value 5 is proposed for this parameter. We have selected the value 3 for it. April 2005 The Arabian Journal for Science and Engineering, Volume 30, Number 1B. 105

12 Figure 10 shows some words, regularities and singularities, and the segmentation of them. (a) (b) (c) Figure 10. Operation of the first segmentation method, which is segmentation based on regularities and singularities. (a) Binarized and noise reduced images of the words:, محجوب, بيفكر. ياسمن (b) Singularities and regularities specified by black and gray colors, respectively. (c) Resulting segmentation Segmentation using Local Minima of the Word Upper Contour In [28], the local minima of the upper contour of words have been considered as candidate positions for segmentation. Then if some conditions are satisfied, segmentation is performed in these positions. In addition, overlapping areas are detected and if required, segmentation is performed there. We have modified this method to be suitable for Nastaaligh handwritten words. In Nastaaligh style, when character Re is connected to a character before it, it is written without any upper contour minima between it and the previous character. As a result, this method is not able to segment character Re. Therefore, a new algorithm for detection and segmentation of connected Re is developed. First, this algorithm is applied to the word image. Then, the overlapping areas are detected and proper segmentation is performed there. Next, the upper contour is found by a simple method, and its local minima are found as primary segmentation points (PSP s). A validation process is performed for these PSP s and the word is segmented at the position of the valid PSP s. The algorithms for these steps are explained in below. Detection and Segmentation of the Connected Re The connected Re is detected using an idea similar to the one used for ascender detection. The special characteristics that discriminate connected Re from the other strokes are its almost 45 degree slope and large length. The proposed algorithm is shown in Figure The Arabian Journal for Science and Engineering, Volume 30, Number 1 B. April 2005

13 1. For each valid connected component do: 1.1. Let ncol= For each column of this connected component, from the left most to the right most column do: If (there exists more than one black run in the current column OR the width of some runs are more than PR0xPW): exit the loop (i.e. goto step 1.3) If (the most bottom black point of the current column is more than PR1 pixels under the lowest black point of the first column in the current decreasing trend (to consider probable rising end of Re )): exit the loop (i.e. goto step 1.3) ncol=ncol If (ncol <= PR3xPW): the sub-word does not contain Re. Else: The traversed lower contour is considered as a sequence of segments of 3-pixel length and a label H or S is assigned to each segment on the bases of, respectively, horizontal or slanted form of the segment The sequence of the labels H and S is smoothed by a state machine (e.g. SHS is converted to SSS, etc.) If (the number of the columns which considered as slanted are more than PR4xPW): a cut with color SEGMENTATION_COLOR is produced at the most right one of the traversed columns. Figure 11. Connected Re detection and segmentation algorithm. The values of the parameters are PR0=2.9, PR1=1, PR2=1, PR3=4.8, PR4=2. The performance of the proposed algorithm is very good. Figure 12 shows some results of this method.. تيررس and, شرف, ظهر Figure 12. The result of connected Re detection algorithm for three words Detection and Segmentation of Overlapped Strokes Segmentation using local minima of the upper contour is not able to segment the overlapped strokes either. For example, the form of middle He (row 8 in Table 1) in Nastaaligh style (such as ) or last Ye (such as ) or middle Mim (such as ) cannot be segmented by this method. This problem can be solved by finding overlapped strokes and performing segmentation in these areas (Figure 14). We have proposed a new algorithm for detection and segmentation of overlapped strokes that is shown in Figure 13. Figure 14 shows several good results obtained from this algorithm. Finding the Minima of Upper Contour and Segmentation The first step in this stage is finding the upper contour. For this purpose, the chain code at each point of the outer contour, found in the previous stages, is obtained by traversing it and saving the West, North-West, and South-West directions in it as the upper contour. Then the weak noise on this contour (with one pixel width and less than ContourNoise height) is eliminated. However, this elimination may result in a sub-segmentation problem for some cases in handwritten Nastaaligh style, e.g. for weak cogs. Thus, we set parameter ContourNoise to zero. Figure 15 shows the upper contour of a word resulting from this algorithm. Then the local minima of the upper contour are found and the segmentation is performed based on them. The algorithm for these operations is shown in Figure 16. April 2005 The Arabian Journal for Science and Engineering, Volume 30, Number 1B. 107

14 1. For each valid connected component do: 1.1. For each column of this connected component, from right most to left most column do: The number, length, and position of the black runs are found in the current column For each black run, starting from the most bottom of them do: If (there is no black points in the right column adjacent to this run AND overlap was found): Move toward left direction until two overlapped pieces are joined together. So two columns that there is overlap between them are found By traversing outer contour, we check that overlaps are not related to a loop On the upper part of the overlap, the position which has the least width is found and is signed for segmentation. Figure 13. Overlapped strokes detection and segmentation algorithm.. انگليسي and, ياسمن, محجوب Figure 14. Operation of the overlap detection algorithm for three words. تخصيص Figure 15. Obtained upper contour for the word 1. For each valid connected component do: 1.1. Finding PSPs: During traversing the upper contour, the points in which a falling trend is replaced with a rising one are found by a state machine, and the positions of these local minima are saved as primary segmentation points (PSP). If the minimum value continues in more than one pixel, the PSP is considered on the pixel whose width of pen is minimum; and if this situation also continues in more than one pixel, the PSP is considered at the middle of this part Validation of PSPs: for each found PSP, if there is no loop under it and the width of pen there is lower than THR1xPW (THR1 is considered equal to 4), this candidate point is valid and is labeled to segment Doing segmentation: in the labeled pixels, a cut is made with color SEGMENTATION_COLOR only if the left and right sides of the cut is black. If one side of the cut is a loop, the column of the hole which is adjacent to the cut is filled with black color to prevent this adjacency After segmentation, the cuts which result to small segments (less than three pixels wide) are canceled. Figure 16. Minima of the upper contour detection and segmentation algorithm. 108 The Arabian Journal for Science and Engineering, Volume 30, Number 1 B. April 2005

15 Figure 17 shows three samples of the operation of the complete segmentation algorithm, i.e. after running the algorithms of Figure 11, 13, and 16. Figure 17. Operation of the second segmentation method, which is segmentation based on the minimums of the upper contour Finding Right-to-Left Order A relatively complicated algorithm for finding right-to-left order of the segments is proposed in [1]. The algorithm proposed here is much less complicated than that algorithm. After ascender and descender elimination, the order can be found independent of the baseline. First, the order of sub-words is found. Then in each sub-word, the order of segments is obtained by considering the right most segment as the first one in the sub-word, and then traversing the outer contour and considering the order in which segments are visited. This algorithm is shown in Figure Feature Extraction The feature extraction method used in a character recognition system is probably the most important factor in achieving a good recognition rate [29]. Many different feature extraction methods are proposed in the literature, and the most suitable ones of them are generally found experimentally. After studying various feature extraction methods and testing some of them [21], we selected a mixed feature vector containing various features from binary and outer contour representations of pseudo-character images. The features we tested include geometric moments [18] extracted from binary and thinned representations, Fourier descriptors [30] extracted from outer contour, discrete and structural features including loop, height-to-width ratio, number of black points to total number of points ratio, position of connection to the right and left pseudo characters [21] extracted from binary representation, and pixel distribution features plus some other discrete features such as end points, T-joints, X- joints, and zero-crossing features [6] extracted from skeletons. Various combinations of these features were tested on images of ideally segmented characters using a mixture-of- Gaussian classifier. Finally, eight features, including three Fourier descriptors (descriptors number one to three), number of loops, height-to-width ratio, the number of black points to total number of points ratio, and the position of right and left connections were selected. This mixed feature vector, in addition to high discrimination power, has a short length that increase the speed of recognition. With these features no skeletonization is required, and so we avoid the complexity of such process. The feature vector is invariant to scale and shift. The normalization of features [6] makes the discrimination effect of them moderately equal. But some features may have more discrimination power than the others and by normalization this deference will be ignored. So we found a weight for each feature experimentally instead of normalizing them. These weights, which show the importance of each feature, resulted in more discrimination power in experiments. Fourier descriptors, which are normalized by the first descriptor, are used with unity weight, the loop feature with weight 40, the height-to-width ratio with weight 15, the number of black points to total number of points ratio with weight 45, and the left and right connections position with weight 40. This feature vector showed a good performance. Table 3 compares the performance of these features with the features proposed in [6]. April 2005 The Arabian Journal for Science and Engineering, Volume 30, Number 1B. 109

16 1. For each connected component do: 1.1. Filling the cut positions with black color in a temporary image (a copy of the segmented image): during the traversing of outer contour, each time a pixel with color SEGMENTATION_COLOR is visited, pixels relevant to this cut are painted black. 2. Finding right-to-left order of sub-word: The connected components of temporary image are sorted by their start columns (i.e. right most column) such that the right most sub-word becomes the first one in the order. 3. For each connected component of the temporary image (i.e. each sub-word) in order obtained in step 2 do: 3.1. For each connected component of the segmented images (i.e. each segment) do: If (the current segment belongs to the current sub-word): it is specified in the relevant index of an array of sub-words (i.e. it is specified that each sub-word contains which segments) Outer contour of the current sub-word is traversed, starting from the right- and top-most point of it If (the current segment, which is traversing, is revisited AND there are some unvisited segments): this segment is moved to the end of the order of segments. Else: If (there are some unvisited segments): the current segment is added at the end of the order of segments. goto If (a pixel with color SEGMENTATION_COLOR is visited): continue traversing in the next segment. goto The position of each segment (i.e. first, middle, last, or isolated) in the current sub-word is specified according to the obtained order of segments. 4. The coordinate of one point is stored in an array in the obtained order of them. 5. Ascenders and descenders are added to the image, new connected components are found, and the order of them is specified using the order of stored points in step Isolated ascenders are removed from the image. Figure 18. Finding right-to-left order algorithm Recognition A CDVDHMM model [1] is used for recognition. The characteristics which make this model fairly suitable for our system are: (1) The sequential nature of writing: Markov models can successfully code the sequential information. (2) Hidden states: In the handwritten word recognition task, the system tries to recover the sequence of characters (as hidden states) from the sequence of observed features (as observations). (3) Continuous symbol probability distribution: there is no vector quantization error in this case, and the multishaped property of Farsi/Arabic characters can be fairly modeled by the mixture-of-gaussian distributions. (4) Variable duration of states: this aspect can handle the over-segmentation problem. We consider the pure form of characters (i.e. without secondary strokes) and the compound forms of characters in Nastaaligh style as the states of the model. So the number of states will be 25. As mentioned before, Farsi characters in various positions have different forms and a character in a given position can also have different shapes (e.g. and ). So, considering all forms of a character in one class will not result in a good recognition rate. To compensate for this problem, we have defined the sub-state idea. We considered different shapes of each character as sub-state of the state assigned to that character. Then in the training stage, we use training images to obtain parameters of each sub-state separately. The role of sub-states in the performance improvement will become clearer in the following sections Training the Model In the training stage, the goal is estimation of model parameters λ = ( Π, A, Γ, B, D) (equations (1) to (5)). As mentioned before, characters are considered as states. Therefore, the states are meaningful, which makes possible avoiding re-estimation methods (e.g. Baum Welch method) for training, and so the training stage becomes simple [1]. Two training sources are used which include training images and the dictionary. Parameters B and D are obtained from training images, and the other parameters from dictionary. 110 The Arabian Journal for Science and Engineering, Volume 30, Number 1 B. April 2005

17 Training with images In this stage, the parameters are computed for each sub-state separately. In this subsection, the word state refers to sub-state. After using the segmentation algorithm on the training images, the state duration probabilities (D) are computed by counting the number of segments of each character manually. The probability that state q i has duration d, ( Pdq ( i )), is equal to the number of times that the character q i is segmented to d parts, divided by the total number of times that this character has appeared in the training images. In our training samples, the maximum duration of a state was four. But to be able to consider worse cases, we consider the maximum duration of states equal to six ( d = 1,2,..., 6 ). The observation pdf (parameter B) is represented as a finite mixture of the form: j M j b ( x) = c. N[ x, µ m= 1 jm jm, U jm ], 1 j N where N represents a Gaussian distribution with mean vector µ jm and covariance matrix U jm for the m th mixture component at state j. x is the vector being modeled, M j is the number of Gaussian components at state j, and c jm is the mixture coefficient for the m th Gaussian component at state j. The mixture gains satisfy the stochastic constraint: M j m= 1 c jm = 1, 1 j N, c jm 0, 1 m M We used the k-means clustering algorithm with a free parameter k and a fixed SNR to find the number of Gaussian functions for each state. We used criterion J 4 = tr[ Sw ] tr[ S m] to determine a proper SNR [31]. This criterion is based on the trace of within-class scattering matrix ( tr[ S w ]) divided by the trace of mixture scattering matrix ( tr[ S m ]). The experimental value 0.9 is obtained as the optimal value for the terminating condition of algorithm (SNR). The mixture coefficient c jm is the number of training samples existing in H jm divided by the total number of training samples for state q. j H jm is the set of the samples in cluster m of state j distribution are estimated as follows: m= 1 q. For each cluster in state x H jm jm j (6) (7) q j, the parameters of Gaussian 1 µ jm = x (8) N 1 T U jm = ( x µ jm )( x µ jm ) (9) N x H jm jm jm where x is the feature vector of the training samples and N jm is the number of samples in H jm. The covariance matrix U jm is assumed to be diagonal in our implementations. Because of the limited amount of available training data, a small constant ρ is added to the diagonal elements of the covariance matrix to prevent it from becoming singular [1]. The value 0.1 is selected for ρ in our implementations. The symbol probability density for an observation O is computed in the recognition stage as: M j 1 T 1 b j ( O) = c jm. exp[ ( O µ jm ) U jm ( O µ n n 2 jm )]. (10) ( 2π ).det[ U ] The Observation O can be composed of one or several consecutive segments. In handwritten word recognition, the shapes of consecutive segments resulting from segmentation process are dependent on each other. Thus the symbol probability for a composite observation is defined as follows [1]: d d b j ( o1 o2... od ) = b j ( O1 ) (11) d where O 1 is the image built by merging segment images o 1, o2,..., od together. The power d is used to balance the symbol probability for different number of segments. This is a necessary normalization procedure when every node in Viterbi net is used to represent a segment [1]. April 2005 The Arabian Journal for Science and Engineering, Volume 30, Number 1B. 111

Improved Method for Sliding Window Printed Arabic OCR

Improved Method for Sliding Window Printed Arabic OCR th Int'l Conference on Advances in Engineering Sciences & Applied Mathematics (ICAESAM'1) Dec. -9, 1 Kuala Lumpur (Malaysia) Improved Method for Sliding Window Printed Arabic OCR Prof. Wajdi S. Besbas

More information

Arabic Text Segmentation

Arabic Text Segmentation Arabic Text Segmentation By Dr. Salah M. Rahal King Saud University-KSA 1 OCR for Arabic Language Outline Introduction. Arabic Language Arabic Language Features. Challenges for Arabic OCR. OCR System Stages.

More information

Modeling Nasta leeq Writing Style

Modeling Nasta leeq Writing Style Modeling Nasta leeq Writing Style Aamir Wali National University of Computer and Emerging Sciences Overview: Urdu اب پ ت ٹ ث ج چ ح خ د ڑ ڈ ذ ر ز ژ س ش ص ض ط ظ ع غ ف ق ک گ ل م ن ه ء ی ے ہ ں و In Urdu, a

More information

VOL. 3, NO. 7, Juyl 2012 ISSN Journal of Emerging Trends in Computing and Information Sciences CIS Journal. All rights reserved.

VOL. 3, NO. 7, Juyl 2012 ISSN Journal of Emerging Trends in Computing and Information Sciences CIS Journal. All rights reserved. Arabic Hand Written Character Recognition Using Modified Multi-Neural Network Farah Hanna Zawaideh Irbid National University, Computer Information System Department dr.farahzawaideh@inu.edu.jo ABSTRACT

More information

Recognition of secondary characters in handwritten Arabic using Fuzzy Logic

Recognition of secondary characters in handwritten Arabic using Fuzzy Logic International Conference on Machine Intelligence (ICMI 05), Tozeur, Tunisia, 2005 Recognition of secondary characters in handwritten Arabic using Fuzzy Logic Mohammed Zeki Khedher1 Ghayda Al-Talib2 1 Faculty

More information

2011 International Conference on Document Analysis and Recognition

2011 International Conference on Document Analysis and Recognition 20 International Conference on Document Analysis and Recognition On-line Arabic Handwrittenn Personal Names Recognition System based b on HMM Sherif Abdelazeem, Hesham M. Eraqi Electronics Engineering

More information

GEOMETRIC-TOPOLOGICAL BASED ARABIC CHARACTER RECOGNITION, A NEW APPROACH

GEOMETRIC-TOPOLOGICAL BASED ARABIC CHARACTER RECOGNITION, A NEW APPROACH GEOMETRIC-TOPOLOGICAL BASED ARABIC CHARACTER RECOGNITION, A NEW APPROACH HAMED TIRANDAZ, MOHSEN AHMADNIA AND HAMIDREZA TAVAKOLI Electrical and Computer Engineering Department, Hakim Sabzevari University,

More information

REEM READYMIX Brand Guideline

REEM READYMIX Brand Guideline REEM READYMIX Brand Guideline Implementing Reem Readymix brand in communications V.I - February 2018 Introduction Reem Readymix is a leading supplier of all types of readymix concrete and cementbased plastering

More information

Online Arabic Handwritten Character Recognition Based on a Rule Based Approach

Online Arabic Handwritten Character Recognition Based on a Rule Based Approach Journal of Computer Science 2012, 8 (11), 1859-1868 ISSN 1549-3636 2012 doi:10.3844/jcssp.2012.1859.1868 Published Online 8 (11) 2012 (http://www.thescipub.com/jcs.toc) Online Arabic Handwritten Character

More information

THE LOGO Guidelines LOGO. Waste Free Environment Brand Guidelines

THE LOGO Guidelines LOGO. Waste Free Environment Brand Guidelines BRAND GUIDELINES THE LOGO Guidelines LOGO SYMBOL TYPEFACE 2 COLOR SCHEME When do I use the full-color logo? Use the full-color logo as frequently as possible to maximize and strengthen the brand. PRIMARY

More information

qatar national day 2017 brand guidelines 2017

qatar national day 2017 brand guidelines 2017 2017 brand guidelines 2017 the following guidelines demonstrate how best to apply the brand 2 CONTENTS 3 contents p5. vision & mission p7. logo p8. logo rationale p9. logo clear space p10. logo do s p11.

More information

Identity Guidelines. December 2012

Identity Guidelines. December 2012 Identity Guidelines December 2012 Identity Guidelines Contents 1.0 Our Logo Our logo Our wordmark Colour treatments Clear space, large and small sizes Correct logo placement Incorrect logo usage 2.0 Colour

More information

SCALE-SPACE APPROACH FOR CHARACTER SEGMENTATION IN SCANNED IMAGES OF ARABIC DOCUMENTS

SCALE-SPACE APPROACH FOR CHARACTER SEGMENTATION IN SCANNED IMAGES OF ARABIC DOCUMENTS 31 st December 016. Vol.94. No. 005-016 JATIT & LLS. All rights reserved. ISSN: 199-8645 www.jatit.org E-ISSN: 1817-3195 SCALE-SPACE APPROACH FOR CHARACTER SEGMENTATION IN SCANNED IMAGES OF ARABIC DOCUMENTS

More information

Umbrella. Branding & Guideline

Umbrella. Branding & Guideline Umbrella. Branding & Guideline OUR LOGO. OUR COLORS. #FFFFFF Font COLOR #2A3942 #64A0C6 mix color C: 75% M: 68% Y: 67% K: 90% H: 320 S:61% B:0 R:0 G:0 B:0 C: 75% M: 68% Y: 67% K: 90% H: 320 S:61% B:0 R:0

More information

Invariant Recognition of Hand-Drawn Pictograms Using HMMs with a Rotating Feature Extraction

Invariant Recognition of Hand-Drawn Pictograms Using HMMs with a Rotating Feature Extraction Invariant Recognition of Hand-Drawn Pictograms Using HMMs with a Rotating Feature Extraction Stefan Müller, Gerhard Rigoll, Andreas Kosmala and Denis Mazurenok Department of Computer Science, Faculty of

More information

Peripheral Contour Feature Based On-line Handwritten Uyghur Character Recognition

Peripheral Contour Feature Based On-line Handwritten Uyghur Character Recognition www.ijcsi.org 273 eripheral Contour Feature Based On-line Handwritten Uyghur Character Recognition Zulpiya KAHAR 1, Mayire IBRAYIM 2, Dilmurat TURSUN 3 and Askar HAMDUA 4,* 1 Institute of Information Science

More information

USING COMBINATION METHODS FOR AUTOMATIC RECOGNITION OF WORDS IN ARABIC

USING COMBINATION METHODS FOR AUTOMATIC RECOGNITION OF WORDS IN ARABIC USING COMBINATION METHODS FOR AUTOMATIC RECOGNITION OF WORDS IN ARABIC 1 AHLAM MAQQOR, 2 AKRAM HALLI, 3 KHALID SATORI, 4 HAMID TAIRI 1,2,3,4 Sidi Mohamed Ben Abdellah University, Faculty of Science Dhar

More information

Fine Classification of Unconstrained Handwritten Persian/Arabic Numerals by Removing Confusion amongst Similar Classes

Fine Classification of Unconstrained Handwritten Persian/Arabic Numerals by Removing Confusion amongst Similar Classes 2009 10th International Conference on Document Analysis and Recognition Fine Classification of Unconstrained Handwritten Persian/Arabic Numerals by Removing Confusion amongst Similar Classes Alireza Alaei

More information

OUR LOGO. SYMBOL LOGO SYMBOL LOGO ORIGINAL STRUCTURE

OUR LOGO. SYMBOL LOGO SYMBOL LOGO ORIGINAL STRUCTURE OUR LOGO. ORIGINAL STRUCTURE SYMBOL LOGO SYMBOL LOGO OUR COLORS. Infographic Color 2A3942 FED708 2A3942 E77525 804D9F CLEAR SPACE. PRINT SAFE AREA MINIMUM SIZE - PRINT H: 30 pt ONLINE SAFE AREA MINIMUM

More information

Segmentation and Recognition of Arabic Printed Script

Segmentation and Recognition of Arabic Printed Script Institute of Advanced Engineering and Science IAES International Journal of Artificial Intelligence (IJ-AI) Vol. 2, No. 1, March 2013, pp. 20~26 ISSN: 2252-8938 20 Segmentation and Recognition of Arabic

More information

Persian/Arabic Baffletext CAPTCHA 1

Persian/Arabic Baffletext CAPTCHA 1 Journal of Universal Computer Science, vol. 12, no. 12 (2006), 1783-1796 submitted: 20/3/06, accepted: 22/12/06, appeared: 28/12/06 J.UCS Persian/Arabic Baffletext CAPTCHA 1 Mohammad Hassan Shirali-Shahreza

More information

Initial Results in Offline Arabic Handwriting Recognition Using Large-Scale Geometric Features. Ilya Zavorin, Eugene Borovikov, Mark Turner

Initial Results in Offline Arabic Handwriting Recognition Using Large-Scale Geometric Features. Ilya Zavorin, Eugene Borovikov, Mark Turner Initial Results in Offline Arabic Handwriting Recognition Using Large-Scale Geometric Features Ilya Zavorin, Eugene Borovikov, Mark Turner System Overview Based on large-scale features: robust to handwriting

More information

DESIGNING OFFLINE ARABIC HANDWRITTEN ISOLATED CHARACTER RECOGNITION SYSTEM USING ARTIFICIAL NEURAL NETWORK APPROACH. Ahmed Subhi Abdalkafor 1*

DESIGNING OFFLINE ARABIC HANDWRITTEN ISOLATED CHARACTER RECOGNITION SYSTEM USING ARTIFICIAL NEURAL NETWORK APPROACH. Ahmed Subhi Abdalkafor 1* International Journal of Technology (2017) 3: 528-538 ISSN 2086-9614 IJTech 2017 DESIGNING OFFLINE ARABIC HANDWRITTEN ISOLATED CHARACTER RECOGNITION SYSTEM USING ARTIFICIAL NEURAL NETWORK APPROACH Ahmed

More information

One Dim~nsional Representation Of Two Dimensional Information For HMM Based Handwritten Recognition

One Dim~nsional Representation Of Two Dimensional Information For HMM Based Handwritten Recognition One Dim~nsional Representation Of Two Dimensional Information For HMM Based Handwritten Recognition Nafiz Arica Dept. of Computer Engineering, Middle East Technical University, Ankara,Turkey nafiz@ceng.metu.edu.

More information

Proposed keyboard layout for Swahili in Arabic script

Proposed keyboard layout for Swahili in Arabic script أ Proposed keyboard layout for Swahili in Arabic script Kevin Donnelly kevin@dotmon.com Version 0.1, March 2010 Introduction Swahili was originally written in Arabic script in its area of origin (the littoral

More information

BRAND GUIDELINES JANUARY 2017

BRAND GUIDELINES JANUARY 2017 BRAND GUIDELINES JANUARY 2017 GETTING AROUND Page 03 05 06 07 08 09 10 12 14 15 Section 01 - Our Logo 02 - Logo Don ts 03 - Our Colors 04 - Our Typeface 06 - Our Art Style 06 - Pictures 07 - Call to Action

More information

1 See footnote 2, below.

1 See footnote 2, below. To: UTC From: Azzeddine Lazrek, Cadi Ayyad University, Marrakesh, Morocco (with Debbie Anderson, SEI, UC Berkeley, and with assistance from Murray Sargent, Laurentiu Iancu, and others) RE: Arabic Math

More information

HMM-Based Handwritten Amharic Word Recognition with Feature Concatenation

HMM-Based Handwritten Amharic Word Recognition with Feature Concatenation 009 10th International Conference on Document Analysis and Recognition HMM-Based Handwritten Amharic Word Recognition with Feature Concatenation Yaregal Assabie and Josef Bigun School of Information Science,

More information

Robust line segmentation for handwritten documents

Robust line segmentation for handwritten documents Robust line segmentation for handwritten documents Kamal Kuzhinjedathu, Harish Srinivasan and Sargur Srihari Center of Excellence for Document Analysis and Recognition (CEDAR) University at Buffalo, State

More information

A Visualization Tool to Improve the Performance of a Classifier Based on Hidden Markov Models

A Visualization Tool to Improve the Performance of a Classifier Based on Hidden Markov Models A Visualization Tool to Improve the Performance of a Classifier Based on Hidden Markov Models Gleidson Pegoretti da Silva, Masaki Nakagawa Department of Computer and Information Sciences Tokyo University

More information

THE LOGO Guidelines LOGO. Waste Free Environment Brand Guidelines

THE LOGO Guidelines LOGO. Waste Free Environment Brand Guidelines BRAND GUIDELINES THE LOGO Guidelines LOGO SYMBOL TYPEFACE 2 COLOR SCHEME When do I use the full-color logo? Use the full-color logo as frequently as possible to maximize and strengthen the brand. PRIMARY

More information

A Statistical approach to line segmentation in handwritten documents

A Statistical approach to line segmentation in handwritten documents A Statistical approach to line segmentation in handwritten documents Manivannan Arivazhagan, Harish Srinivasan and Sargur Srihari Center of Excellence for Document Analysis and Recognition (CEDAR) University

More information

IDIAP. Martigny - Valais - Suisse IDIAP

IDIAP. Martigny - Valais - Suisse IDIAP R E S E A R C H R E P O R T IDIAP Martigny - Valais - Suisse Off-Line Cursive Script Recognition Based on Continuous Density HMM Alessandro Vinciarelli a IDIAP RR 99-25 Juergen Luettin a IDIAP December

More information

A Proposed Hybrid Technique for Recognizing Arabic Characters

A Proposed Hybrid Technique for Recognizing Arabic Characters A Proposed Hybrid Technique for Recognizing Arabic Characters S.F. Bahgat, S.Ghomiemy Computer Eng. Dept. College of Computers and Information Technology Taif University Taif, Saudi Arabia S. Aljahdali,

More information

Strings 20/11/2018. a.k.a. character arrays. Strings. Strings

Strings 20/11/2018. a.k.a. character arrays. Strings. Strings ECE 150 Fundamentals of Programming Outline 2 a.k.a. character arrays In this lesson, we will: Define strings Describe how to use character arrays for strings Look at: The length of strings Copying strings

More information

Character Set Supported by Mehr Nastaliq Web beta version

Character Set Supported by Mehr Nastaliq Web beta version Character Set Supported by Mehr Nastaliq Web beta version Sr. No. Character Unicode Description 1 U+0020 Space 2! U+0021 Exclamation Mark 3 " U+0022 Quotation Mark 4 # U+0023 Number Sign 5 $ U+0024 Dollar

More information

1. Brand Identity Guidelines.

1. Brand Identity Guidelines. 1. Brand Identity Guidelines 1.1 HCT Logo 2. Secondary left aligned for English language literature 1. Primary centre aligned stacked formal 3. Secondary right aligned for Arabic language literature 4.

More information

Handwritten Character Recognition Based on the Specificity and the Singularity of the Arabic Language

Handwritten Character Recognition Based on the Specificity and the Singularity of the Arabic Language Handwritten Character Recognition Based on the Specificity and the Singularity of the Arabic Language Youssef Boulid 1, Abdelghani Souhar 2, Mohamed Youssfi Elkettani 1 1 Department of Mathematics, Faculty

More information

Nastaliq Font. Shahab Mohsen. A thesis. presented to the University of Waterloo. in fulfillment of the. thesis requirement for the degree of

Nastaliq Font. Shahab Mohsen. A thesis. presented to the University of Waterloo. in fulfillment of the. thesis requirement for the degree of The Problem of Stretching in Persian Calligraphy and a New Type 3 PostScript Nastaliq Font by Shahab Mohsen A thesis presented to the University of Waterloo in fulfillment of the thesis requirement for

More information

L2/11-033R 1 Introduction

L2/11-033R 1 Introduction To: UTC and ISO/IEC JTC1/SC2 WG2 Title: Proposal to add ARABIC MARK SIDEWAYS NOON GHUNNA From: Lorna A. Priest (SIL International) Date: 10 February 2011 1 Introduction ARABIC MARK SIDEWAYS NOON GHUNNA

More information

Hidden Loop Recovery for Handwriting Recognition

Hidden Loop Recovery for Handwriting Recognition Hidden Loop Recovery for Handwriting Recognition David Doermann Institute of Advanced Computer Studies, University of Maryland, College Park, USA E-mail: doermann@cfar.umd.edu Nathan Intrator School of

More information

On-line handwriting recognition using Chain Code representation

On-line handwriting recognition using Chain Code representation On-line handwriting recognition using Chain Code representation Final project by Michal Shemesh shemeshm at cs dot bgu dot ac dot il Introduction Background When one preparing a first draft, concentrating

More information

PRINTED ARABIC CHARACTERS CLASSIFICATION USING A STATISTICAL APPROACH

PRINTED ARABIC CHARACTERS CLASSIFICATION USING A STATISTICAL APPROACH PRINTED ARABIC CHARACTERS CLASSIFICATION USING A STATISTICAL APPROACH Ihab Zaqout Dept. of Information Technology Faculty of Engineering & Information Technology Al-Azhar University Gaza ABSTRACT In this

More information

Indian Multi-Script Full Pin-code String Recognition for Postal Automation

Indian Multi-Script Full Pin-code String Recognition for Postal Automation 2009 10th International Conference on Document Analysis and Recognition Indian Multi-Script Full Pin-code String Recognition for Postal Automation U. Pal 1, R. K. Roy 1, K. Roy 2 and F. Kimura 3 1 Computer

More information

ABJAD: AN OFF-LINE ARABIC HANDWRITTEN RECOGNITION SYSTEM

ABJAD: AN OFF-LINE ARABIC HANDWRITTEN RECOGNITION SYSTEM ABJAD: AN OFF-LINE ARABIC HANDWRITTEN RECOGNITION SYSTEM RAMZI AHMED HARATY and HICHAM EL-ZABADANI Lebanese American University P.O. Box 13-5053 Chouran Beirut, Lebanon 1102 2801 Phone: 961 1 867621 ext.

More information

LECTURE 6 TEXT PROCESSING

LECTURE 6 TEXT PROCESSING SCIENTIFIC DATA COMPUTING 1 MTAT.08.042 LECTURE 6 TEXT PROCESSING Prepared by: Amnir Hadachi Institute of Computer Science, University of Tartu amnir.hadachi@ut.ee OUTLINE Aims Character Typology OCR systems

More information

ERD ENTITY RELATIONSHIP DIAGRAM

ERD ENTITY RELATIONSHIP DIAGRAM ENTITY RELATIONSHIP DIAGRAM M. Rasti-Barzoki Website: Entity Relationship Diagrams for Data Modelling An Entity-Relationship Diagram () shows how the data that flows in the system is organised and used.

More information

Nafees Nastaleeq v1.01 beta

Nafees Nastaleeq v1.01 beta Nafees Nastaleeq v1.01 beta Release Notes November 07, 2007 CENTER FOR RESEARCH IN URDU LANGUAGE PROCESSING NATIONAL UNIVERSITY OF COMPUTER AND EMERGING SCIENCES, LAHORE PAKISTAN Table of Contents 1 Introduction...4

More information

A MELIORATED KASHIDA-BASED APPROACH FOR ARABIC TEXT STEGANOGRAPHY

A MELIORATED KASHIDA-BASED APPROACH FOR ARABIC TEXT STEGANOGRAPHY A MELIORATED KASHIDA-BASED APPROACH FOR ARABIC TEXT STEGANOGRAPHY Ala'a M. Alhusban and Jehad Q. Odeh Alnihoud Computer Science Dept, Al al-bayt University, Mafraq, Jordan ABSTRACT Steganography is an

More information

Multifont Arabic Characters Recognition Using HoughTransform and HMM/ANN Classification

Multifont Arabic Characters Recognition Using HoughTransform and HMM/ANN Classification 50 JOURNAL OF MULTIMEDIA, VOL. 1, NO. 2, MAY 2006 Multifont Arabic Characters Recognition Using HoughTransform and HMM/ANN Classification Nadia Ben Amor National Engineering School of Tunis, Tunisia n.benamor@ttnet.tn,

More information

Linear Discriminant Analysis in Ottoman Alphabet Character Recognition

Linear Discriminant Analysis in Ottoman Alphabet Character Recognition Linear Discriminant Analysis in Ottoman Alphabet Character Recognition ZEYNEB KURT, H. IREM TURKMEN, M. ELIF KARSLIGIL Department of Computer Engineering, Yildiz Technical University, 34349 Besiktas /

More information

An Efficient Character Segmentation Based on VNP Algorithm

An Efficient Character Segmentation Based on VNP Algorithm Research Journal of Applied Sciences, Engineering and Technology 4(24): 5438-5442, 2012 ISSN: 2040-7467 Maxwell Scientific organization, 2012 Submitted: March 18, 2012 Accepted: April 14, 2012 Published:

More information

Proposed Solution for Writing Domain Names in Different Arabic Script Based Languages

Proposed Solution for Writing Domain Names in Different Arabic Script Based Languages Proposed Solution for Writing Domain Names in Different Arabic Script Based Languages TF-AIDN, June/2014 Presented by AbdulRahman I. Al-Ghadir Researcher in SaudiNIC Content What we have done so far? Problem

More information

Research Article Offline Handwritten Arabic Character Recognition Using Features Extracted from Curvelet and Spatial Domains

Research Article Offline Handwritten Arabic Character Recognition Using Features Extracted from Curvelet and Spatial Domains Research Journal of Applied Sciences, Engineering and Technology 11(2): 158-164, 2015 DOI: 10.19026/rjaset.11.1702 ISSN: 2040-7459; e-issn: 2040-7467 2015 Maxwell Scientific Publication Corp. Submitted:

More information

androidcode.ir/post/install-eclipse-windows-android-lynda

androidcode.ir/post/install-eclipse-windows-android-lynda ا موزش برنامه نويسی اندرويد آ زش ای ا رو ز ن ر دو, ۲۶ دی ۰۷:۰۶ ۱۳۹۰ ب.ظ مراحل نصب ايکليپس (Eclipse) روی ويندوز ی ) ( آ زش ا ا در و وز در pdf ا آ زش( 2.43 ( ۰. از ا اس دی رو ده (راھ ی.(SDK ۱.ا ای ا رو ازش

More information

SEVERAL METHODS OF FEATURE EXTRACTION TO HELP IN OPTICAL CHARACTER RECOGNITION

SEVERAL METHODS OF FEATURE EXTRACTION TO HELP IN OPTICAL CHARACTER RECOGNITION SEVERAL METHODS OF FEATURE EXTRACTION TO HELP IN OPTICAL CHARACTER RECOGNITION Binod Kumar Prasad * * Bengal College of Engineering and Technology, Durgapur, W.B., India. Rajdeep Kundu 2 2 Bengal College

More information

ISeCure. The ISC Int'l Journal of Information Security. High Capacity Steganography Tool for Arabic Text Using Kashida.

ISeCure. The ISC Int'l Journal of Information Security. High Capacity Steganography Tool for Arabic Text Using Kashida. The ISC Int'l Journal of Information Security July 2010, Volume 2, Number 2 (pp. 107 118) http://www.isecure-journal.org High Capacity Steganography Tool for Arabic Text Using Kashida Adnan Abdul-Aziz

More information

Arabic Diacritics Based Steganography Mohammed A. Aabed, Sameh M. Awaideh, Abdul-Rahman M. Elshafei and Adnan A. Gutub

Arabic Diacritics Based Steganography Mohammed A. Aabed, Sameh M. Awaideh, Abdul-Rahman M. Elshafei and Adnan A. Gutub Arabic Diacritics Based Steganography Mohammed A. Aabed, Sameh M. Awaideh, Abdul-Rahman M. Elshafei and Adnan A. Gutub King Fahd University of Petroleum and Minerals Computer Engineering Department Dhahran

More information

Recognition of online captured, handwritten Tamil words on Android

Recognition of online captured, handwritten Tamil words on Android Recognition of online captured, handwritten Tamil words on Android A G Ramakrishnan and Bhargava Urala K Medical Intelligence and Language Engineering (MILE) Laboratory, Dept. of Electrical Engineering,

More information

Developing a Real Time Method for the Arabic Heterogonous DBMS Transformation

Developing a Real Time Method for the Arabic Heterogonous DBMS Transformation Developing a Real Time Method for the Arabic Heterogonous DBMS Transformation S. M. Hadi, S. Murtatha Department of Information & Comm. Eng. College of Engineering Al- Khawarizmi,University of Baghdad

More information

Recognition of Unconstrained Malayalam Handwritten Numeral

Recognition of Unconstrained Malayalam Handwritten Numeral Recognition of Unconstrained Malayalam Handwritten Numeral U. Pal, S. Kundu, Y. Ali, H. Islam and N. Tripathy C VPR Unit, Indian Statistical Institute, Kolkata-108, India Email: umapada@isical.ac.in Abstract

More information

Building Multi Script OCR for Brahmi Scripts: Selection of Efficient Features

Building Multi Script OCR for Brahmi Scripts: Selection of Efficient Features Building Multi Script OCR for Brahmi Scripts: Selection of Efficient Features Md. Abul Hasnat Center for Research on Bangla Language Processing (CRBLP) Center for Research on Bangla Language Processing

More information

Managing Resource Sharing Conflicts in an Open Embedded Software Environment

Managing Resource Sharing Conflicts in an Open Embedded Software Environment Managing Resource Sharing Conflicts in an Open Embedded Software Environment Koutheir Attouchi To cite this version: Koutheir Attouchi. Managing Resource Sharing Conflicts in an Open Embedded Software

More information

Tracing and Straightening the Baseline in Handwritten Persian/Arabic Text-line: A New Approach Based on Painting-technique

Tracing and Straightening the Baseline in Handwritten Persian/Arabic Text-line: A New Approach Based on Painting-technique Tracing and Straightening the Baseline in Handwritten Persian/Arabic Text-line: A New Approach Based on Painting-technique P. Nagabhushan and Alireza Alaei 1,2 Department of Studies in Computer Science,

More information

Cursive Handwriting Recognition System Using Feature Extraction and Artificial Neural Network

Cursive Handwriting Recognition System Using Feature Extraction and Artificial Neural Network Cursive Handwriting Recognition System Using Feature Extraction and Artificial Neural Network Utkarsh Dwivedi 1, Pranjal Rajput 2, Manish Kumar Sharma 3 1UG Scholar, Dept. of CSE, GCET, Greater Noida,

More information

Award Winning Typefaces by Linotype

Award Winning Typefaces by Linotype Newly released fonts and Midan awarded coveted design prize Award Winning Typefaces by Linotype Bad Homburg, 23 April 2007. Linotype has once again received critical recognition for their commitment to

More information

CS443: Digital Imaging and Multimedia Binary Image Analysis. Spring 2008 Ahmed Elgammal Dept. of Computer Science Rutgers University

CS443: Digital Imaging and Multimedia Binary Image Analysis. Spring 2008 Ahmed Elgammal Dept. of Computer Science Rutgers University CS443: Digital Imaging and Multimedia Binary Image Analysis Spring 2008 Ahmed Elgammal Dept. of Computer Science Rutgers University Outlines A Simple Machine Vision System Image segmentation by thresholding

More information

A Survey of Problems of Overlapped Handwritten Characters in Recognition process for Gurmukhi Script

A Survey of Problems of Overlapped Handwritten Characters in Recognition process for Gurmukhi Script A Survey of Problems of Overlapped Handwritten Characters in Recognition process for Gurmukhi Script Arwinder Kaur 1, Ashok Kumar Bathla 2 1 M. Tech. Student, CE Dept., 2 Assistant Professor, CE Dept.,

More information

3 Qurʾānic typography Qurʾānic typography involves getting the following tasks done.

3 Qurʾānic typography Qurʾānic typography involves getting the following tasks done. TUGboat, Volume 31 (2010), No. 2 197 Qurʾānic typography comes of age: Æsthetics, layering, and paragraph optimization in ConTEXt Idris Samawi Hamid 1 The background of Oriental TEX Attempts to integrate

More information

An Accurate and Efficient System for Segmenting Machine-Printed Text. Yi Lu, Beverly Haist, Laurel Harmon, John Trenkle and Robert Vogt

An Accurate and Efficient System for Segmenting Machine-Printed Text. Yi Lu, Beverly Haist, Laurel Harmon, John Trenkle and Robert Vogt An Accurate and Efficient System for Segmenting Machine-Printed Text Yi Lu, Beverly Haist, Laurel Harmon, John Trenkle and Robert Vogt Environmental Research Institute of Michigan P. O. Box 134001 Ann

More information

Brand Identity Manual Fonts and Typography

Brand Identity Manual Fonts and Typography Brand Identity Manual Fonts and Typography Brand Fonts NCB TYPE LANGUAGE To maintain a coherent brand identity, NCB uses a set of compatible and typefaces as its corporate primary fonts. These adjacent

More information

The University of Bradford Institutional Repository

The University of Bradford Institutional Repository The University of Bradford Institutional Repository http://bradscholars.brad.ac.uk This work is made available online in accordance with publisher policies. Please refer to the repository record for this

More information

Feature Extraction Techniques of Online Handwriting Arabic Text Recognition

Feature Extraction Techniques of Online Handwriting Arabic Text Recognition 2013 5th International Conference on Information and Communication Technology for the Muslim World. Feature Extraction Techniques of Online Handwriting Arabic Text Recognition Mustafa Ali Abuzaraida 1,

More information

Farsi Handwritten Word Recognition Using Discrete HMM and Self- Organizing Feature Map

Farsi Handwritten Word Recognition Using Discrete HMM and Self- Organizing Feature Map 0 International Congress on Informatics, Environment, Energy and Applications-IEEA 0 IPCSIT vol.38 (0 (0 IACSIT Press, Singapore Farsi Handwritten Word Recognition Using Discrete H and Self- Organizing

More information

BÉZIER CURVES TO RECOGNIZE MULTI-FONT ARABIC ISOLATED CHARACTERS

BÉZIER CURVES TO RECOGNIZE MULTI-FONT ARABIC ISOLATED CHARACTERS BÉZIER CURVES TO RECOGNIZE MULTI-FONT ARABIC ISOLATED CHARACTERS AzzedineMazroui and AissaKerkourElmiad Faculty of Sciences, Oujda, Morroco azze.mazroui@gmail.com, kerkour8@yahoo.fr ABSTRACT The recognition

More information

ISO/IEC JTC 1/SC 2. Yoshiki MIKAMI, SC 2 Chair Toshiko KIMURA, SC 2 Secretariat JTC 1 Plenary 2012/11/05-10, Jeju

ISO/IEC JTC 1/SC 2. Yoshiki MIKAMI, SC 2 Chair Toshiko KIMURA, SC 2 Secretariat JTC 1 Plenary 2012/11/05-10, Jeju ISO/IEC JTC 1/SC 2 Yoshiki MIKAMI, SC 2 Chair Toshiko KIMURA, SC 2 Secretariat 2012 JTC 1 Plenary 2012/11/05-10, Jeju what is new Work items ISO/IEC 10646 2 nd ed. 3 rd ed. (2012) ISO/IEC 14651 Amd.1-2

More information

A Segmentation Free Approach to Arabic and Urdu OCR

A Segmentation Free Approach to Arabic and Urdu OCR A Segmentation Free Approach to Arabic and Urdu OCR Nazly Sabbour 1 and Faisal Shafait 2 1 Department of Computer Science, German University in Cairo (GUC), Cairo, Egypt; 2 German Research Center for Artificial

More information

L E A R N I N G B A G - O F - F E AT U R E S R E P R E S E N TAT I O N S F O R H A N D W R I T I N G R E C O G N I T I O N

L E A R N I N G B A G - O F - F E AT U R E S R E P R E S E N TAT I O N S F O R H A N D W R I T I N G R E C O G N I T I O N L E A R N I N G B A G - O F - F E AT U R E S R E P R E S E N TAT I O N S F O R H A N D W R I T I N G R E C O G N I T I O N leonard rothacker Diploma thesis Department of computer science Technische Universität

More information

Short Survey on Static Hand Gesture Recognition

Short Survey on Static Hand Gesture Recognition Short Survey on Static Hand Gesture Recognition Huu-Hung Huynh University of Science and Technology The University of Danang, Vietnam Duc-Hoang Vo University of Science and Technology The University of

More information

Word Matching of handwritten scripts

Word Matching of handwritten scripts Word Matching of handwritten scripts Seminar about ancient document analysis Introduction Contour extraction Contour matching Other methods Conclusion Questions Problem Text recognition in handwritten

More information

OFF-LINE HANDWRITTEN JAWI CHARACTER SEGMENTATION USING HISTOGRAM NORMALIZATION AND SLIDING WINDOW APPROACH FOR HARDWARE IMPLEMENTATION

OFF-LINE HANDWRITTEN JAWI CHARACTER SEGMENTATION USING HISTOGRAM NORMALIZATION AND SLIDING WINDOW APPROACH FOR HARDWARE IMPLEMENTATION OFF-LINE HANDWRITTEN JAWI CHARACTER SEGMENTATION USING HISTOGRAM NORMALIZATION AND SLIDING WINDOW APPROACH FOR HARDWARE IMPLEMENTATION Zaidi Razak 1, Khansa Zulkiflee 2, orzaily Mohamed or 3, Rosli Salleh

More information

NF-SAVO: Neuro-Fuzzy system for Arabic Video OCR

NF-SAVO: Neuro-Fuzzy system for Arabic Video OCR NF-SAVO: Neuro-Fuzzy system for Arabic Video OCR Mohamed Ben Halima, Hichem karray, Adel. M. Alimi REGIM: REsearch Group on Intelligent Machines University of Sfax, National School of Engineers (ENIS)

More information

CHAPTER 1 INTRODUCTION

CHAPTER 1 INTRODUCTION CHAPTER 1 INTRODUCTION 1.1 Introduction Pattern recognition is a set of mathematical, statistical and heuristic techniques used in executing `man-like' tasks on computers. Pattern recognition plays an

More information

with Profile's Amplitude Filter

with Profile's Amplitude Filter Arabic Character Segmentation Using Projection-Based Approach with Profile's Amplitude Filter Mahmoud A. A. Mousa Dept. of Computer and Systems Engineering, Zagazig University, Zagazig, Egypt mamosa@zu.edu.eg

More information

A Modified Method for Calculating Reliability Allocation in Series Parallel Systems

A Modified Method for Calculating Reliability Allocation in Series Parallel Systems A Modified Method for Calculating eliability Allocation in Series Parallel Systems Abstract: In order to improve system reliability, designers may introduce a system of different technologies in parallel.

More information

A System for Joining and Recognition of Broken Bangla Numerals for Indian Postal Automation

A System for Joining and Recognition of Broken Bangla Numerals for Indian Postal Automation A System for Joining and Recognition of Broken Bangla Numerals for Indian Postal Automation K. Roy, U. Pal and B. B. Chaudhuri CVPR Unit; Indian Statistical Institute, Kolkata-108; India umapada@isical.ac.in

More information

A) 9 B) 12 C) 24 D) 32 A) 9 B) 7 C) 8 D) 12 A) 400 B) 420 C) 460 D) 480 A) 16 B) 12 C) 8 D) 4. A) n+1 B) n - 1 C) n D) None of the above

A) 9 B) 12 C) 24 D) 32 A) 9 B) 7 C) 8 D) 12 A) 400 B) 420 C) 460 D) 480 A) 16 B) 12 C) 8 D) 4. A) n+1 B) n - 1 C) n D) None of the above 1 4 4 ادب ر اور 2.ر ں رHistory3 ا ض ا رس ا ا ب A) 9 B) 12 C) 24 D) 32 ال 1 ں ا ا ر فا رس ا ب A) 9 B) 7 C) 8 D) 12 "BENZENE" وف ا ل ت ا ظ A) 400 B) 420 C) 460 D) 480 1 2 3 C B A اور D ں combinations ا 3

More information

Available online at ScienceDirect. Procedia Computer Science 45 (2015 )

Available online at  ScienceDirect. Procedia Computer Science 45 (2015 ) Available online at www.sciencedirect.com ScienceDirect Procedia Computer Science 45 (2015 ) 205 214 International Conference on Advanced Computing Technologies and Applications (ICACTA- 2015) Automatic

More information

Image representation. 1. Introduction

Image representation. 1. Introduction Image representation Introduction Representation schemes Chain codes Polygonal approximations The skeleton of a region Boundary descriptors Some simple descriptors Shape numbers Fourier descriptors Moments

More information

Language Manual. Arabic HQ and HD

Language Manual. Arabic HQ and HD Language Manual Arabic HQ and HD Language Manual: Arabic HQ and HD Published 22 March 2011 Copyright 2003-2011 Acapela Group. All rights reserved This document was produced by Acapela Group. We welcome

More information

ON OPTICAL CHARACTER RECOGNITION OF ARABIC TEXT

ON OPTICAL CHARACTER RECOGNITION OF ARABIC TEXT The 6th Saudi Engineering Conference, KFUPM, Dhahran, December 2002 Vol. 4. 109 ON OPTICAL CHARACTER RECOGNITION OF ARABIC TEXT Abdelmalek Zidouri 1, Muhammad Sarfraz 2 1: Assistant professor, Electrical

More information

II. WORKING OF PROJECT

II. WORKING OF PROJECT Handwritten character Recognition and detection using histogram technique Tanmay Bahadure, Pranay Wekhande, Manish Gaur, Shubham Raikwar, Yogendra Gupta ABSTRACT : Cursive handwriting recognition is a

More information

Center for Language Engineering Al-Khawarizmi Institute of Computer Science University of Engineering and Technology, Lahore

Center for Language Engineering Al-Khawarizmi Institute of Computer Science University of Engineering and Technology, Lahore Sarmad Hussain Sarmad Hussain Center for Language Engineering Al-Khawarizmi Institute of Computer Science University of Engineering and Technology, Lahore www.cle.org.pk sarmad@cantab.net Arabic Script

More information

Printed and Handwritten Arabic Characters Recognition and Convert It to Editable Text Using K-NN and Fuzzy Logic Classifiers

Printed and Handwritten Arabic Characters Recognition and Convert It to Editable Text Using K-NN and Fuzzy Logic Classifiers Journal of University of Thi-Qar Vol.9 No. Mar.4 Printed and Handwritten Arabic Characters Recognition and Convert It to Editable Text Using K-NN and Fuzzy Logic Classifiers Zamen F. Jaber Computer Department,

More information

Packet Steganography Using IP ID. Abstract

Packet Steganography Using IP ID. Abstract Packet Steganography Using IP ID *1 Head of Computer Science Department, Collage of Science, University of Diyala, Iraq 2* Computer Eng., University of Technology, Baghdad, Iraq Received 21 January 2014

More information

Lecture 8 Object Descriptors

Lecture 8 Object Descriptors Lecture 8 Object Descriptors Azadeh Fakhrzadeh Centre for Image Analysis Swedish University of Agricultural Sciences Uppsala University 2 Reading instructions Chapter 11.1 11.4 in G-W Azadeh Fakhrzadeh

More information

Assignment 2. Unsupervised & Probabilistic Learning. Maneesh Sahani Due: Monday Nov 5, 2018

Assignment 2. Unsupervised & Probabilistic Learning. Maneesh Sahani Due: Monday Nov 5, 2018 Assignment 2 Unsupervised & Probabilistic Learning Maneesh Sahani Due: Monday Nov 5, 2018 Note: Assignments are due at 11:00 AM (the start of lecture) on the date above. he usual College late assignments

More information

Learning-Based Candidate Segmentation Scoring for Real-Time Recognition of Online Overlaid Chinese Handwriting

Learning-Based Candidate Segmentation Scoring for Real-Time Recognition of Online Overlaid Chinese Handwriting 2013 12th International Conference on Document Analysis and Recognition Learning-Based Candidate Segmentation Scoring for Real-Time Recognition of Online Overlaid Chinese Handwriting Yan-Fei Lv 1, Lin-Lin

More information

Design and Implementation of Electronic Board Advertisement Based on Microcontroller PIC16F887and Multi Character Button Keypad

Design and Implementation of Electronic Board Advertisement Based on Microcontroller PIC16F887and Multi Character Button Keypad Eng. & Tech. Journal, Vol. 31, No. 1, 2013 Design and Implementation of Electronic Board Advertisement Based on Microcontroller PIC16F887and Multi Character Button Keypad Mousa Kadhim Wali Electrical and

More information

Mono-font Cursive Arabic Text Recognition Using Speech Recognition System

Mono-font Cursive Arabic Text Recognition Using Speech Recognition System Mono-font Cursive Arabic Text Recognition Using Speech Recognition System M.S. Khorsheed Computer & Electronics Research Institute, King AbdulAziz City for Science and Technology (KACST) PO Box 6086, Riyadh

More information