Chapter 2. Components

Size: px
Start display at page:

Download "Chapter 2. Components"

Transcription

1 Chapter 2 [2]OCR: General Architecture and Components In some areas which require the automation of human intelligence, such as chess playing, tremendous improvements are achieved over the last few decades. On the other hand, humans still outperform even the most powerful computers in the relatively routine functions such as vision. Automation of character recognition is one of these areas, which was the subject of intense research for the last three decades, yet it leaves many problems still open. The literature survey shows that there is extensive work carried out on English Language OCR. Researchers from different countries have also worked on their languages giving rise to different technologies suitable for their languages. But the result is still not satisfactory. We studied papers covering the International languages like English, Chinese, Korean, Japanese, Mongolian, Thai, Arabic, etc. A few papers were available on Indian languages like Bangla, Devanagari, Tamil, Telugu, Malayalam and Kannada. Since the amount of HCR work on Indian languages was found to be limited, we broadened the survey to include offline / online, printed / handwritten and across any language with the view to identify useful directions and guidelines from their approaches. Different architectural stages and the technologies used in each stage are analyzed. Though the specifics of architecture and model vary with the approach used, in general, an OCR system has four stages as shown in figure

2 Input raw data Preprocessing Feature Extraction Classification Recognition result Post- Processing Figure 2.1 Genreral Architecture of OCR The extensive survey work investigated the kind of methods used by different researchers in each stage of OCR system. As the different stage processing methods depend on the data acquisition, we also analyzed the ways of acquiring the images and the different parameters that influence the input data. 2.1 Input Data Since the invention of the printing press in the fifteenth century by Johannes Gutenberg, most of archived written language has been in the form of printed-paper documents. In such documents, text is presented as a visual image mostly in black on a high contrast background that is generally white. Written language is also encountered in the form of handwriting inscribed on paper or registered on an electronically sensitive surface. Some of the major aspects of collecting and storing the input image are discussed below. The image: The image acquired for recognition by OCR system can be a page of text, or a word or a character. A page of text is available only in the offline cases and it may be a printed text or handwritten text or a mixture of both. In online cases, the image is of a cursive style or mixed style [refer ] handwritten word or a character. Online acquisition: In the case of online writing, a special pen is used to write on an electronic surface such as a Liquid Crystal Display (LCD). The digitizers are usually electromagnetic-electrostatic tablets which send the coordinates of the pen tip to the host computer at regular intervals. Hence the noise generated along with the data is less and can be controlled by the writer. This gives the image of the character or word along with the information of the movement of the pen in horizontal (x) and vertical (y) directions as two separate 1-D digital signals captured while writing. There are different devices with different sensors to collect the online information while writing. Devices like PDAs, Tablet PC, etc, are transparent position sensing devices and they also give the pen tip 26

3 position co-ordinates along with visual display. Some devices like super pen can also measure pressure exerted while writing. Offline acquisition: In the case of offline writing, the printed or handwritten data is converted to digital form either by scanning the writing on paper using scanner or by capturing the still image using camera producing offline text images. This process yields a digital image in a specified image file format. Cameras are used in cases where fragile documents need to be preserved. Such paper documents cannot be forced flat and the light source for digital cameras is usually uneven. These make the input images to be handled in a different way than the scanner scanned images. Resolution: The resolution of the captured character image plays an important role in deciding the quality of the input image. Since the character is made of two states, with a black trace of the character contour on a white paper, capturing the image as a binary image can reduce the tonal resolution and hence the storage space. In some cases, this can result in character images with broken edges, background noise, etc, due to improper ink flow, non uniform paper quality, etc. The reduction in the spatial resolution results in step change in the contour. The contour of the character image will be smooth and continuous only when both the tonal (gray level) and spatial (dots/inch) resolutions are good. As the resolution increases, the storage space and the processing time increases. To capture a text image with a quality of information as on paper, both these resolutions should be set properly based on the smallest character size, fine details like small strokes, circular shapes, etc, in a character. Hence it is required to acquire the image with good resolution and with proper brightness and contrast adjustments so that the OCR can intelligently convert it into a binary image and make the character shape within the text image suitable for its recognition. Even though color scanners and tablets enable data acquisition with high resolution, there is always a trade off between the acquired image quality and complexity of the algorithms, which limits the recognition rate. Storage requirements: The raw data storage requirements are widely different depending on the acquisition device and the resolution setups, e.g., the data requirements for an average cursively written word are 230 bytes in the on-line case (sampling at 100 samples/sec), and 80 Kbytes in the off-line case (sampling at 300 dots per inch (dpi)). A 27

4 typical 8.5 X 11 inch page scanned at a resolution of 300 dpi with 256 gray level resolution needs 8.4 megabytes of space. File format: The standard image file formats like BMP, GIF, PNG, JPG, TIFF, etc, are used to store the image. Choosing a right file format to save the images is of vital importance. Each image format is suited to a specific type of image and matching the image to correct format results in a small file size, quality and fast-loading graphic. The user can view the image on computer screen using the standards like VGA (640x480pixels with 16 colors or gray shades) and SVGA (800x600 or 1024x768 pixels with 256 colors or gray shades). 2.2 Preprocessing Most of the recognition methods use feature extraction to assign an image to a prototype class. The recognition accuracy strongly depends on selected features and the feature extraction in tern depends on how well the text is represented in the image with respect to the background. This text can be a page, word or a character. In online cases, the text is usually a character or a word. But in offline cases, the text is usually a page of a document. As the acquired images contain various types of noises due to different reasons as stated in section and 1.4.2, they need to be preprocessed to get rid of all unwanted information with only the text prominent. Hence preprocessing stage is concerned with processing of the acquired image to make it suitable for feature extraction so that only the shape of the character or word contributes to the feature. The intention is to make the strokes forming the character of uniform thickness, with no unintended breaks, etc, and ensure no other marks are present in the image. This is generally a hard problem; for example, distinguishing a. which is part of the character, from that caused by noise. Hence preprocessing is in itself a research area and many researchers concentrate on this domain of character recognition. The preprocessing techniques broadly fall under the domain of image processing. The main objectives of the preprocessing are noise reduction, normalization of the data and compression of information. Some of the techniques used to achieve these objectives are discussed below briefly, explaining the purpose and the commonly used methods. 28

5 2.2.1 Noise reduction The text image distortions like disconnected edges, bumps and gaps in edges, filled loops, local intensity variations, non uniform thickness of the edges, etc, need to be eliminated. There are many techniques proposed in the literature to reduce the effect of these noises. The standard techniques are available in image processing tool boxes. Depending on the requirements, some researchers have developed their own specialized noise reduction algorithms Filtering Filters are used to remove or to diminish the effect of noise, to enhance the edges for better discrimination with the background, etc. The basic idea is to convolve the pre-defined mask with the image. The mask size decides the neighborhood distance to be considered in the computation. For example, a 3x3 mask will consider all the neighborhood pixels at a distance of one. The immediate 8 neighbors of the center pixel of the image under modification will be used in the decision. The convolution process computes a new value for the center pixel as a function of the grey values of its m x m neighborhood pixels x(i-k, j-l) and the mask weights W kl given by (2.1). y ( i, j) w x( i k, j l) (2.1) k l kl The mask size has an effect on the noise. As the mask becomes bigger, more neighborhood pixels participate in the decision and may introduce some unexpected distortions. Various spatial and frequency domain filters can be designed for smoothing, sharpening, contrast adjustment, salt and pepper noise elimination, thresholding, etc. Some of these are explained below. The effect of these filters is shown in figure 2.2. (a) Original image (b)smoothing (c) sharpening (d) contrast adjustment (e) salt and pepper noise (f) salt and pepper noise removed (g) Gaussian Noise (h) Gaussian low pass filter (i) Gaussian smoothing (j) contrast adjustment Figure 2.2 Effect of filters 29

6 Smoothing: The rugged edges due to Gaussian noise, writer, paper and pen imperfections are smoothened using these filters. The different smoothing filters are Averaging filter, Wiener filter, Low pass filters like Gaussian filter, Butterworth filter, etc. As it cuts on high frequency components, if not used carefully, it will blur the image. Sharpening: To enhance the sharpness of the edges, the filters namely, high boost filter, Gaussian filter, high pass filters, etc, are used. This filter is to be used carefully to avoid the noise becoming dominant. Removal of salt and pepper noise: It is also called as speckle noise. The white (salt) and the black (pepper) spots which are distinct in the image constitute this noise. Order-statistic filter - median filter is used to remove this noise. The size of the mask should be chosen carefully. If the number of noisy pixel is more than half of the total pixels used in the median computation, the noise cannot be eliminated. Hence depending on the density of the noise, the mask size is chosen. But, if the density of the edge is less, then this filter may result in broken edges. Contrast adjustment: The low contrast character images are enhanced by making dark portions darker and bright portions brighter. The position and the slope of the inputoutput gray level mapping curve should be chosen carefully around which the stretch in both dark and bright portions happens. This increases the valley between the dark and bright regions and hence simplifies thresholding. But, there is a possibility of background becoming foreground and vice versa. Thresholding: In order to increase processing speed, it is often desirable to represent grey scale or color images as binary images by picking some threshold value such that everything above that value is set to 1 and everything below is set to 0. According to a survey [162], the algorithms have evolved from global thresholding to local adaptive thresholding to allow for variations in the image background. Today they range from relatively simple algorithms to some that are rather complex. Global threholding picks one threshold value for the entire document image, often based on an estimation of the background level from the intensity histogram of the image. Adaptive (local) threholding is a method used for images in which different regions of the image may require different threshold values. 30

7 In [164] comparison of many thresholding techniques is given, using evaluation criteria of word recognition power to check the accuracies of a character recognition system. On those tested, Niblack s locally adaptive method produces the best result. In this method, the threshold for each pixel as determined by examining the average of the pixels m(x,y) and the standard deviation (x,y) in the neighborhood. In [27], a comparative performance evaluation of thresholding algorithms applied to OCR is done. They used Hausdorff, Jaccard and Yule distance measures to measure the similarities between the thresholded and original images. When all the measuring parameters of each method are summed up, Lloyd, Otsu, Local Average Thresholding based on Otsu, Local Contrast technique (LCT) and Nonlinear Dynamic Method (NDA) scored highest. Otsu s thresholding is used by many researchers [18][16][84]. Otsu binarization is a global thresholding method with threshold fixed based on minimization of the weighted sum of within-class variances of the foreground and background pixels using gray level histograms [163]. Thinned Edge extraction: The thresholded image contour thickness varies from image to image and also it may vary within the image due to many factors like pressure exerted while writing, pen tip width, ink flow, etc. As the contour thickness influences the computations of feature extraction, character shape can be conveyed by the single pixel width trace of the contour. To extract a single pixel width contour, techniques like Laplacian of Gaussian (LOG), canny edge detector, etc, are used. These methods give the responses at the transitions from white to black and from black to white. Hence there will be double edge responses from these operators as shown in figure 2.3. These methods preserve the edge thickness information. Figure 2.3 Edge extraction: Effect of LOG and canny edge detection techniques 31

8 Morphological operations Morphology is mathematically a set theory that is used for the analysis and processing of geometrical shapes in an image. These operations can be done for both binary and gray images. As the character images are binary images, we consider here only binary morphological operations. These operations convolve a structuring element over an input image, creating an output image of the same size. The structuring element is used to construct a morphological operation that is sensitive to specific shapes in the input image. The structuring element is itself a binary image with origin indicating the position of the pixel and size indicating the neighborhood shape to be considered for modification. Some examples of structuring elements are in figure 2.4. The center of the structuring element is marked with O which indicates the pixel position to be modified based on the filter effect. There may be more than one structuring element and every pixel output under the origin of the structuring element may be decided immediately or may be marked first and decided later based on further neighborhood observations x2 neighborhood 3x3 neighborhood Figure 2.4 Structuring elements with O as origin Two basic morphological operations given in equation (2.2) are called dilation and erosion. Dilation Erosion D( I, A) : E( I, A) : I A { x ( Aˆ ) I } I A { x ( Aˆ ) I} x x (2.2) Where I and A are the set of pixels in the input image and the structuring element respectively. Here  x means the reflection of A about its origin followed by a shift by x positions. One can choose the structuring element with these effects and hence can avoid performing them explicitly. The results depend on the size, shape and the origin chosen 32

9 for the structuring element. In Dilation, when any pixel in the structuring element matches a black pixel (in our case) in the input image, the output image pixel under the origin is set to black. This tends to close the holes in the image by expanding the black regions. It also makes objects larger by changing every background pixel that is touching the object as object pixels. In erosion, when every pixel in the structuring element matches black pixels in the input image, the output pixel image under the origin is set to black. This tends to make the object smaller by changing the object pixels that are touching the background pixels to background. Various morphological operations can be defined using these basic operations and new structuring elements can be designed to smooth the contour, join the broken edges, prune the wild points and edges, open the joints, find the skeleton of the character, thin the character, extract the boundaries, clean the noise, etc. Some of these are discussed below. Opening: Opening smoothes the contours, breaks down narrow bridges and eliminates thin protrusions. Thus opening isolates objects which may be just touching one another. Opening of an image I is done using two basic operations: erosion (E) followed by dilation (D) using the same structuring element A as shown in equation (2.3). This technique is useful to eliminate the small islands and thin filaments of object pixels. OPEN (I,A) = D( E(I,A), A) (2.3) Closing: Closing fuses narrow breaks and eliminates small holes. Thus closing fills the narrow gaps and joins the contours. Closing of an image I is done using two basic operations: dilation (D) followed by erosion (E) using the same structuring element A as shown in equation (2.4). This technique is useful to eliminate the small islands and thin filaments of background pixels. CLOSE (I,A) = E( D(I,A), A) (2.4) Boundary extraction: It is used to extract the boundary of the binary image. Hence it is also called as morphological gradient operator. Object boundary is extracted by first eroding the image I by the structuring element A and then the set difference between I and its erosion is taken as given by equation (2.5). The structuring element A size decides the boundary thickness. For example, 3x3 size gives a one pixel width boundary. 33

10 BOUNDARY (I,A) = I (I A) (2.5) Thinning / Skeletonization: In OCR problems, most of the information can be extracted from the shape of the strokes. As the stroke thickness is influenced by a number of factors like, pen tip, ink flow, paper quality, pressure exerted, etc, the stroke shape extraction becomes difficult. Therefore, there is a need for removal of redundant pixels and a skeleton of a given character or stroke is in most cases sufficient for recognition. Thinning and skeletonization methods generate a minimally connected line that is equidistant from the boundaries. The skeletonization is a reconstructable thinning algorithm that preserves all details of internal structure of the objects within the resulting skeleton. Hence from the skeleton of an object, original object can be reconstructed. It is similar to thinning as thinned image is a skeleton of the object but with some loss of information and hence cannot be used for object reconstruction. The behavior of these operations depends on structuring element as it determines the situations for an object pixel to be set to background. A group of structuring elements is used to produce the skeleton and the process is repeated until no change in the output image is observed. This operation is shown by equation (2.6). (I KA) = ( (I A) A) A ) A (2.6) Here (I KA) indicates successive erosions of I for K times and every time I is eroded based on two level decision using a group of structuring elements (usually 8 depending on 8 direction processing) A ={A 1,A 2, A 8 }. There are two basic approaches for thinning namely, pixel-wise thinning and non pixelwise thinning. In pixel-wise thinning, the image is locally and iteratively processed until one pixel wide skeleton is obtained. These methods include erosion and iterative contour pealing. They are very sensitive to noise and may deform the shape of the character. The non-pixel-wise methods use some global information about the character during the thinning. Clustering based thinning methods, use the cluster centers as the skeleton. In [28], the skeleton growing algorithm uses gray level image for skeletonization. It controls the development of the skeleton using iterative skeletonization and deletion of boundary pixels, which is nested within the iterative binarization of the gray level image. The results were compared with other 3 algorithms that 34

11 worked on binary image. The effects of 3 other algorithms are given by images 1, 2 and 3 as shown in figure 2.5. They failed to separate lines that are touching or very close to each other. The skeleton growing algorithm could handle these problems as shown in 4 th image in figure 2.5. Figure 2.5 Effect of different thinning methods In [13], pre-thinning logic is first applied to reduce the effect of binarization noise. Then the image is thinned and a post-thinning logic is applied to remove the thinning distortions like hairs, splitting of fork points, etc. The post-thinning is also called as pruning. Pruning: The extra tail pixels generated after thinning or skeletonization methods are removed using pruning using equation (2.7). I pruning = I thinning A i (2.7) That is, perform convolution with the structuring elements in A until no change. One of the structuring elements in A is as shown in figure 2.6. The 1 in the center is object pixel, 0 is background pixel and X can be 0 or 1. By using the 45 0 rotations of this structuring element, other structuring elements can be obtained. The pixel removal using Pruning also cleans the image. 0 X X Figure 2.6 3x3 pruning structuring element 35

12 Cleaning: This is used to remove spurious points left over in the image after some operations like thinning. Cleaning is done by using erosion with the structuring element (similar to pruning with X value set to 0) whose center matches to the noise and neighborhood matches to the object and vice versa Compression The word compression refers to the representation of the character shape in the image with minimal information. Thresholding, thin edge extraction and morphological thinning (skeletonization) algorithms are used for compression. The minimal representation is an essential component in template matching techniques, where large numbers of images need to be preserved for comparison. The feature extraction becomes fast due to logical computations with only two values 0 and 1 and also due to minimal data involved in feature computation Normalization Normalization methods aim to remove commonly observed variations caused during writing which otherwise may influence the feature extraction process. The document, the individual text lines, individual words and even characters within a word may be skewed. Some strokes of the character shape may be out of proportion in comparison to others. It is required to make the shape of the character in a text to be as close as possible to the natural standard shape of the character so that their effect on features can be minimized. This is achieved by 3 basic types of normalizations namely skew normalization, slant normalization and size normalization. Also there is a need for contrast normalization of the text page image as text page background variations may influence the normalization process. The normalization methods are briefly described below Skew normalization There are two types of skew. The global skew is associated with the complete page misalignment with respect to horizontal direction caused while scanning, writing, etc. The local skew is associated with the individual character misalignment in a given word or with respect to base line of a text as shown in figure

13 Every text line has a base line and usually all the base lines should be parallel. If they are not parallel then, every base line needs to be normalized for orientation. There are some characters that can be distinguished relative to the position with respect to the base line (eg. 9 and g in handwritten form). Hence identifying the baseline is important for recognizing a character. A wide variety of skew detection algorithms have been proposed in the literature. The commonly used methods are Hough transform, analysis of projection profiles, run length analysis, etc. Figure 2.7 Skew and slope for normalization In [121], for Bangle script skew detection method is proposed. The script has shirorekha (head-lines) and the approach is based on detection of these shirorekha of words. Individual words in a text line are detected by the method of connected component labeling. As the upper envelopes of selected components contain shirorekha information, they are found by column-wise scanning from the top of the component. Portions of the upper envelope satisfying the properties of digital straight line are detected. They are then clustered into groups belonging to single text lines. Estimates from these individual clusters give the skew angle of each text (base) line. The proposed multi-skew detection technique has accuracy about 98.3% Slant normalization It is also called as tilt correction. Some people write straight and some slant. The angle between longest stroke in a word and the vertical direction is referred to as slant. By slant normalization, all characters are normalized to a standard form with no slant. [19] uses alif, a very commonly used Arabic character which is almost vertical in case of no tilt, for tilt identification. The method scans for the occurrence of this letter and estimates the tilt based upon the letter s slope. 37

14 In [122], a method to correct the slant of Arabic text is proposed. To calculate the average slant angle of a word, all contours in a word are traversed to compute the slant angle for each near-vertical line. For a contour pixel on row n, the absolute differences between the x-coordinates of adjacent points on rows n-2, n-1, n, n+1 and n+2 on the same contour are added. This sum gives the slant-ness of the contour. If the sum is below a threshold, they are assumed to be part of near-vertical line. The weighted average of the individual slants angles of all contours in a word is used to compute the global average slant angle of the word for individual word slant correction Size normalization It is used to adjust the size of the character in the image to a certain standard. The original sizes of the handwritten characters vary to a great extent and may influence the feature extraction. There are some characters that can be distinguished only based on the aspect ratio of the character shape (eg. O and 0 ). Hair line strokes and small openings of the characters are much less likely to be detected in text set in a small font size 6 or 8 point (1 point = 1/72 inch) than in normal 10 to12 point font sizes. Such images have to be scaled by size normalization before further processing. In most cases, the size of the character is normalized by retaining the shape and the aspect ratio of the character. Linear Interpolation/decimation scaling methods are very commonly used [56]. There may be some strokes with dis-proportionate lengths which may influence the aspect ratio. There are other normalization techniques to handle these kinds of problems, but, they are computationally expensive. In [21], a fuzzy normalization method is applied to Chinese handwritten characters to deal with the irregular stroke lengths by compressing the redundant portions and preserving the important features. The recognition results improved from 80% using normal scaling to 85% using fuzzy normalization for special sample images with irregular stroke lengths. In [22], resampling of the handwritten digit gray image is based on multi-rate filter theory that is implemented by a cascade of interpolation and decimation filters namely, Hamming and Gaussian window functions. The recognition accuracy of Multi-rate method (96.8%) is compared with ratio based normalization (96.7%) and simple scaling (96.5%). An improvement of 0.3% is observed over normal scaling method. 38

15 In [56], to make the input images of same dimension, cubic spline interpolation in polynomial form and in poly-phase network form and linear decimation by block averaging are used. Interpolation in polynomial form show better recognition rates as compared to poly-phase network form, while the latter provides a low computational complexity solution for real time applications In [123], a Normalization-Cooperated Gradient Feature (NCGF) extraction method is proposed to alleviate the distortion of the original stroke direction due to pre size normalization. Five 1-D coordinate normalization methods are used: Linear Normalization (LN), Non Linear Normalization (NLN), Moment Normalization (MN), Bi-Moment Normalization (BMN) and Modified Centroid-Boundary Alignment (MCBA). They are extended to Pseudo 2 Dimensional (P2D) versions called Line Density Projection Interpolation (LDPI), P2DMN, P2DBMN and P2DCBA. The experimental results show that NCGF with P2D methods has less error rate than the NCCF and Normalization Based Gradient Function (NBGF) methods. The effect of each of these methods is shown in figure 2.8. Figure 2.8 Effect of different normalization methods on Chinese characters In [124], a fuzzy normalization method is proposed to normalize the irregular stroke lengths of Chinese characters. The recognition results of Fuzzy and scale normalized images are compared. When invariant features like number of strokes and ring data are tested with maximum distance clustering method on special samples, the fuzzy normalized image recognition rate is 85% as compared to 80% for the scaling normalization Contrast normalization It deals with the correction of non uniform contrast and brightness of the background surrounding the image. The background elimination of a page of text and making it uniformly 39

16 white improves the contrast of the text image. To normalize the local contrast in a document, researchers have proposed different methods, such as In [20], least mean square (error) estimation method is proposed that also uses generalized fuzzy operator to enhance the object of interest. All these techniques are applied in many areas of image processing. While enhancing the image in a specific way as mentioned above, most of the techniques may introduce some unexpected distortions and therefore should be applied with care. The outcome of preprocessing stage should be a clean normalized image with maximal shape information, maximal compression and minimal noise. In general, given the compromise between introducing further distortion and correcting existing problems, forming an effective proper preprocessing pipeline is a challenging open problem, often very dependent on the nature of images and types of problems usually observed and their severity for the intended task Preprocessing of a page of text image The acquired image under document image processing is usually a page of text. To extract the text, one needs to do line, word and character extraction [16][17], skew detection and correction at page level [121], etc. The background may be noisy due to old paper, rough surface, thin paper, colored paper, paper folds, back page ink visibility, etc. It may also be embedded with patterns or pictures. Hence background elimination or contrast normalization is to be done to get a uniform intensity background. Smoothening filter and median filter are used to reduce the background noise by some researchers [25] and some have analyzed the intensity variations in background and foreground to generate a uniform intensity for the background [23][24]. The detection of interfering marks such as blots, underscores, and creases is complex as they are randomly present in the image and may even affect the text quality at those positions. Another equally time consuming task is the localized skew correction that is necessary for handset pages and the baseline curl for pages copied from bound volumes. We restrict our attention to single character (including Kagunita) image at a time and hence do not discuss these issues further in this thesis. 40

17 2.2.4 Preprocessing of character images The characters thus extracted from a page of text need to be further processed. Compared to printed characters, the handwritten characters need extensive preprocessing due to writing variations and noise due to pen, ink, paper quality and writing environment. The methods discussed here deal with handwritten character preprocessing and these methods can be applied for printed characters depending on the kind of problems (eg, distortion, uneven print, etc.) observed. As we are dealing with character recognition, a survey of character preprocessing is of utmost importance and we investigated the preprocessing done on the character images by the researchers. The broad categories of preprocessing operations and their role have been discussed earlier in sections and The character image extracted from a page of text needs to be normalized to make the character shape close to a standard shape bounded within a fixed size image with proportionate stroke lengths [20][21][22]. Some of the normalization operations are deslanting (removing slant) [122], size normalization (to compensate for scaling) [56][123] and stroke length normalization (to compensate for velocity, dis-proportionate stroke lengths) [124] etc. While writing, due to rapid or erratic pen movements, some hook kind of pen movement may happen at the end or beginning of a stroke. Dehooking [141] helps to remove such hooks. Blocking algorithms are used to perform bounding box extraction to detect exactly where the textual information exists within the scanned image and to produce boundary touching character images by removing the surrounding background pixels [15]. Sharpening Filters help to sharpen the character shape in the image. In case of blur, faint or fine edges with intensities close to background, sharpening filter helps to preserve the edges. Smoothing helps to remove jitter noise from the image and helps in removing the directional changes between the adjacent pixels [15]. Gray-tone scanning helps for adaptive binarization or for gray-scale feature extraction. Adaptive local binarization helps cope with uneven contrast, but fine or faint connecting strokes can be more easily detected by complete gray scale processing. Binarization converts a gray image into a black and white image. This process may cause fragmentation problem and can be solved by filling in the broken joints. Thinning reduces the patterns to single pixel width thin line representation of the image called skeleton. Any unwanted edges after skeletonization needs to be pruned and spurious dots need to be cleaned [15]. This also reduces the amount of data. The shape analysis for feature extraction can be made more easily with thin line patterns. 41

18 The above preprocessing methods are chosen by the researchers depending on the quality of the input image and the common problems observed in their usage scenario. A general pipeline flexible enough to handle any deformation observed in the input image is still a research area. We discuss our approach and studies in chapter Segmentation Segmentation, in general, deals with separating out parts from a larger entity. As mentioned earlier, an image for character recognition may be a page which can be segmented into different lines, which in turn can be segmented to words, then to individual characters and then to strokes (in some cases). Though we deal with character level images and hence segmentation is not a major concern in our work we briefly discuss the approaches to various types of segmentation, for completeness, in this section. A survey on segmentation is given in [57][160]. In offline OCR, segmentation deals with extraction of Text lines from a page image Words from text line image Character images from a word image Individual strokes from a character image (part of feature extraction) Character and individual stroke segmentation falls under internal segmentation whereas the paragraphs, lines and words segmentation falls under external segmentation. The segmentation is done based on whether HCR uses words or characters for recognition. With holistic approaches, attempt is made to segment a word to recognize the attributes present within words rather than partitioning a word and attempting to recognize each part of it [11][12]. On the other hand, with the analytical approach, the characters are individually separated from the rest of the word. The strokes can be extracted from the character image or the whole character image is used for further processing. The analytical approach is better suited for on-line applications [7][44][30] For off-line applications both the analytical and holistic approaches can be used. External segmentation: Because of inter-line distance variability and base line skew variability, line segmentation in unconstrained handwritten document is very difficult. This 42

19 becomes still complicated when overlapping occurs between two consecutive text lines. The techniques followed by some researchers are as follows. In [16], Run Length Smearing Algorithm (RLSA) and morphological operations are used to segment individual text lines from unconstrained handwritten document image. RLSA links together neighboring black/white areas that are within a predefined distance to get a word as a component. On these components, morphological erosion is done to extract foreground information and background information for line segmentation. [17] uses dual method based on interdependency between text-line and inter line gap using histogram peaks and inter peak valleys for line identification. The intra-line curve cuts through the character strokes of a text line as many times as possible as long as these lines are straight. The imaginary inter-line curve that separates two text lines above and below is also generated similarly with many conditions like Inter-line curve should not cross intra-line curve and vice versa. Both curves grow in parallel guiding one another and after a few iterations, semi optimal piecewise linear curves for both text and inter-line gaps is obtained for line identification. Internal segmentation: The complexity of segmentation of characters within the word increases as we move from isolated discrete character words to cursive handwriting, more complex mixed handwriting and overlapped writing. This is still a complex unsolved problem. The natural skewness in the handwritten words poses some challenges for automatic character segmentation. The handwritten words contain some consistent skewness and also inconsistent skewness. As majority of HCR systems depend upon upright images, the skewed images severely degrade the performance of such systems. In [14], to individually separate a character from a word, the analytic segmentation technique is used. Firstly, a simple heuristic approach is used to identify valid segmentation points between the characters. This usually looks for minimas or arcs between characters common in handwritten cursive script. In many cases these arcs are the ideal segmentation points. Holes are found in some characters that are totally or partially closed (eg, a, u, o, etc.). Some times the segmentation points may segment a holed character in half. To avoid such cuts, after deciding the segmentation point, the hole-seeking algorithm checks whether it had not segmented a character in half by checking for holes and closeness of two segmentation points with respect to average 43

20 character width. This process of finding correct segmentation points is automated using feed-forward NN with back-propagation by training it with the manually segmented handwritten words. While testing the heuristically segmented word image is inputted to the NN to obtain the correctly identified segmentation points. Identified points are retained and the remaining points are removed. In [70] Segmentation of handwritten English numerals and alphabets is done by moving a marble on either side of touching characters for the selection of cut point. By making it move downwards, or diagonally downwards, or to the right or to the left based on the current position and its surroundings and the cut is made at point where the marble falls. Segmentation is also used for multi script recognition, the script identification (segmentation) can be done based on holistic approach and further text recognition of a particular script is done with analytic approach [81], thus exploiting both the approaches. In [34], online signature verification is done by extracting global features using holistic approach from the signature and local features using analytic approach. Segmentation till words can be part of preprocessing but the segmentation of characters and the segmentation of strokes from the characters are often a part of feature extraction. In such cases, the feature extraction process is called segmentation based feature extraction [7]. 2.4 Feature Extraction Character recognition algorithms do not usually work on images; they use features identified from the image as the input dimension for recognition. Identifying the right set of features and effectively and efficiently extracting them from the image are often the key elements in character recognition problem. In this section, we briefly look at existing literature in this regard. Our approach is detailed in chapter 7. A good feature set should capture appropriate characteristics of a class that help distinguish it from other classes, while remaining invariant to characteristic differences within the class. Based on segmentation models, the HCR can be segmentation-based (analytic approach) or segmentation-free (holistic approach) [7][11]. The isolated character applications use analytic approach. Where as for cursive word recognition, some have used analytic approach wherein the characters in the word are first segmented and then used for 44

21 feature extraction [40]. But segmentation is a difficult task and so some researchers tried on word recognition based on holistic approach with the dictionary support. A survey of feature extraction is presented in [158]. The chief design function is to select the best set of features, which maximizes the recognition rate with the least amount of elements. This problem can be formulated as a dynamic programming problem for selecting the k-best features out of N features, with respect to a cost function such as Fishers Discriminant ratio. Selection of features using a methodology as mentioned here requires expensive computational power and most of the time yields a sub optimal solution. Therefore, the feature selection is mostly done by heuristic or intuition for a specific type of application, usually guided by empirical experiments, an analysis of the shapes of various characters in the target set, etc Holistic approach Human readers easily resolve the confusion in recognition of similar shapes, because they don t consider each letter or numeral in isolation. They also adapt instantly to each typeface and even to a mixture of typefaces. But the automation of the same is a very difficult task. Holistic approaches mimic the way humans perceive the text. Holistic strategy employs top-down approaches for recognizing the full word, eliminating the segmentation problem. The price for this computational saving is to constrain the problem of OCR to limited vocabulary. This scheme can tolerate dramatic amounts of deformation within words, as often seen in cursive script. However, it is greatly dependent upon its prescribed lexicon as they are the nodes by which the objects of recognition are compared. Most of the conventional handwritten word recognition methods found in the literature are lexicon-driven methods in which one is given a handwritten word image with a list of possible target words the lexicon. The recognition of word image is basically a matching process. Each algorithm gives a way of matching the word image with the given text words in the lexicon. The best match gives the recognition result [11][12]. Due to the complexity introduced by the whole cursive / mixed handwriting word (compared to the complexity of a single character or stroke), the recognition accuracy is decreased. 45

22 2.4.2 Analytic approach Most of the HCR systems use analytic approach and recognize individual characters primarily by their shape [6]. Shape is a property of a class of characters and also of a particular method of observation or measurement. Shape cannot depend on size, color or location. In most of the languages the measurements are related to the features like length, width, curvature, orientation, relative position of strokes, etc. The measurements should be invariant to translation and scale. Hence the challenge is to find descriptions from the character image that are invariant to transformations leaving figures unaltered in unimportant ways, yet sensitive to transformations that change figures in important ways. Every character image should have some unique feature to uniquely identify itself. But all characters that cannot be distinguished by the given method of measurement are said to have the same shape resulting in ambiguity. Hence there is a need for different kinds of shape measurements for feature extraction. The analytic strategies employ bottom up approaches starting from stroke or character level and going towards producing a meaningful text. Segmentation is required which, not only adds extra complexity to the problem, but also introduces segmentation error to the system. However, with the cooperation of segmentation stage, the problem is reduced to the recognition of simple isolated characters or strokes, which can be handled for unlimited vocabulary with high recognition rates [29]. The comparison of both strategies is listed in table 2.1. Table 2.1 Strategies: Holistic vs. Analytic Holistic Strategy Analytic Strategy Whole word recognition Limited vocabulary No segmentation Sub-word or letter recognition Unlimited vocabulary Requires explicit or implicit segmentation Based on the method of data acquisition and the kind of information available for feature extraction, we have two categories of features off-line features and on-line features. 46

23 Offline features To recognize the hand written character image, different primitive features are extracted from the preprocessed character image during segmentation or after segmentation [29]. The features can be extracted from the whole image or from a specific part of it by dividing the image into several overlapping or non-overlapping zones (windows or cells). Hundreds of features are mentioned in the literature; the major ones can be broadly categorized as follows: Statistical features: These features are derived from the statistical distribution of foreground points in the image. They provide high speed and low complexity and take care of style variations to some extent. The statistical features can be extracted from the whole image or from zones of the image. The features like density of the points, number of strokes, count of each direction strokes, area, perimeter, number of crossings (number of times line segments are traversed by vectors in specified directions), curve distance from the boundary, compactness, count of start points and end points, pen ups, number of sub patterns, etc. can be extracted from the whole image and also from each of the zone. Structural features: The two common classes of structural features is straight line and curved line. These features describe the structure of the character shape like shape is straight, curved, circular, etc. Usually, a character shape has many structural features and hence they need to be segmented. For a segment identified as a straight line, the orientation or inclination angle of the segment, distinguish it into vertical line, horizontal line, positive slant, negative slant, etc. Categorizing curved lines is a more complex task. These features have high tolerances to distortions and style variations. They also tolerate a certain degree of translation and rotation. The sequence of structural features forming a character shape may be considered as features. Geometrical features: Since a character shape may have many structural features, simply identifying a segment as line or curve or as a vertical line may not describe the shape fully. We use geometrical features to describe them further. Hence Geometrical features are also referred as structural features in the literature. The length of the line/curve, the angle of orientation, etc, are some of the geometric features. These features may represent global (whole image under observation) and zonal (part of image under observation- local) properties of characters. 47

24 Topological (Positional) features: These features determine the relative position of the geometrical feature. The line/curve position within a zone (whole image may be considered as a single zone), position of start point and end point of the character, For example, if an image is divided into non overlapped 3x3 zones, then, a curve may be in the left region of (1,1) zone. As the topological features further characterize geometrical features and geometrical features distinguish structural features, we refer these three categories of features as structural features in this report unless they need to be distinguished. Global Transformation and Series Expansion Features: The transform domain representation of the image generally highlights the information (signals) that cannot be visualized in a spatial domain image and can be used for generating features. The signal representation of the image generally provides additional opportunities as it can be transformed into other domain (eg, time and frequency). These signals can be represented as a linear combination of a series of simpler well-defined functions. The coefficients of the linear combination provide a compact encoding known as series expansion. Some common transform and series expansion ways of feature extraction are as follows. Fourier Transform: It represents the spatial domain image as a summation of sinusoids of varying frequencies and amplitudes. The general procedure is to choose magnitude spectrum of the image as features. One of the most attractive properties of the Fourier Transform is the ability to recognize the position-shifted characters, when it observes the magnitude spectrum and ignores the phase. The draw back is that the time (local) information is lost. Gabor Transform: The Gabor transform is a variation of the windowed Fourier Transform. In this case, the window used is defined by a Gaussian function. This transformation maps a signal into a two dimensional function of time and frequency. The draw back is that the window size remains fixed for all frequencies. By varying the width and the angle of orientation of the Gaussian function, Gabor wavelets can be generated. Width variation helps to extract the edges of varying thicknesses and orientation angle helps to extract the edges in a particular direction from the image. Wavelet transform: Wavelet transformation is about Multi-Resolution analysis. It provides more flexibility than Gabor Transform where one can vary the window size 48

EE795: Computer Vision and Intelligent Systems

EE795: Computer Vision and Intelligent Systems EE795: Computer Vision and Intelligent Systems Spring 2012 TTh 17:30-18:45 WRI C225 Lecture 04 130131 http://www.ee.unlv.edu/~b1morris/ecg795/ 2 Outline Review Histogram Equalization Image Filtering Linear

More information

Image Processing Fundamentals. Nicolas Vazquez Principal Software Engineer National Instruments

Image Processing Fundamentals. Nicolas Vazquez Principal Software Engineer National Instruments Image Processing Fundamentals Nicolas Vazquez Principal Software Engineer National Instruments Agenda Objectives and Motivations Enhancing Images Checking for Presence Locating Parts Measuring Features

More information

CS443: Digital Imaging and Multimedia Binary Image Analysis. Spring 2008 Ahmed Elgammal Dept. of Computer Science Rutgers University

CS443: Digital Imaging and Multimedia Binary Image Analysis. Spring 2008 Ahmed Elgammal Dept. of Computer Science Rutgers University CS443: Digital Imaging and Multimedia Binary Image Analysis Spring 2008 Ahmed Elgammal Dept. of Computer Science Rutgers University Outlines A Simple Machine Vision System Image segmentation by thresholding

More information

CITS 4402 Computer Vision

CITS 4402 Computer Vision CITS 4402 Computer Vision A/Prof Ajmal Mian Adj/A/Prof Mehdi Ravanbakhsh, CEO at Mapizy (www.mapizy.com) and InFarm (www.infarm.io) Lecture 02 Binary Image Analysis Objectives Revision of image formation

More information

A Survey of Problems of Overlapped Handwritten Characters in Recognition process for Gurmukhi Script

A Survey of Problems of Overlapped Handwritten Characters in Recognition process for Gurmukhi Script A Survey of Problems of Overlapped Handwritten Characters in Recognition process for Gurmukhi Script Arwinder Kaur 1, Ashok Kumar Bathla 2 1 M. Tech. Student, CE Dept., 2 Assistant Professor, CE Dept.,

More information

Morphological Image Processing

Morphological Image Processing Morphological Image Processing Binary image processing In binary images, we conventionally take background as black (0) and foreground objects as white (1 or 255) Morphology Figure 4.1 objects on a conveyor

More information

Digital Image Processing

Digital Image Processing Digital Image Processing Third Edition Rafael C. Gonzalez University of Tennessee Richard E. Woods MedData Interactive PEARSON Prentice Hall Pearson Education International Contents Preface xv Acknowledgments

More information

Topic 4 Image Segmentation

Topic 4 Image Segmentation Topic 4 Image Segmentation What is Segmentation? Why? Segmentation important contributing factor to the success of an automated image analysis process What is Image Analysis: Processing images to derive

More information

09/11/2017. Morphological image processing. Morphological image processing. Morphological image processing. Morphological image processing (binary)

09/11/2017. Morphological image processing. Morphological image processing. Morphological image processing. Morphological image processing (binary) Towards image analysis Goal: Describe the contents of an image, distinguishing meaningful information from irrelevant one. Perform suitable transformations of images so as to make explicit particular shape

More information

Fine Classification of Unconstrained Handwritten Persian/Arabic Numerals by Removing Confusion amongst Similar Classes

Fine Classification of Unconstrained Handwritten Persian/Arabic Numerals by Removing Confusion amongst Similar Classes 2009 10th International Conference on Document Analysis and Recognition Fine Classification of Unconstrained Handwritten Persian/Arabic Numerals by Removing Confusion amongst Similar Classes Alireza Alaei

More information

Morphological Image Processing

Morphological Image Processing Morphological Image Processing Morphology Identification, analysis, and description of the structure of the smallest unit of words Theory and technique for the analysis and processing of geometric structures

More information

LECTURE 6 TEXT PROCESSING

LECTURE 6 TEXT PROCESSING SCIENTIFIC DATA COMPUTING 1 MTAT.08.042 LECTURE 6 TEXT PROCESSING Prepared by: Amnir Hadachi Institute of Computer Science, University of Tartu amnir.hadachi@ut.ee OUTLINE Aims Character Typology OCR systems

More information

Processing of binary images

Processing of binary images Binary Image Processing Tuesday, 14/02/2017 ntonis rgyros e-mail: argyros@csd.uoc.gr 1 Today From gray level to binary images Processing of binary images Mathematical morphology 2 Computer Vision, Spring

More information

Biometrics Technology: Image Processing & Pattern Recognition (by Dr. Dickson Tong)

Biometrics Technology: Image Processing & Pattern Recognition (by Dr. Dickson Tong) Biometrics Technology: Image Processing & Pattern Recognition (by Dr. Dickson Tong) References: [1] http://homepages.inf.ed.ac.uk/rbf/hipr2/index.htm [2] http://www.cs.wisc.edu/~dyer/cs540/notes/vision.html

More information

Character Recognition

Character Recognition Character Recognition 5.1 INTRODUCTION Recognition is one of the important steps in image processing. There are different methods such as Histogram method, Hough transformation, Neural computing approaches

More information

Critique: Efficient Iris Recognition by Characterizing Key Local Variations

Critique: Efficient Iris Recognition by Characterizing Key Local Variations Critique: Efficient Iris Recognition by Characterizing Key Local Variations Authors: L. Ma, T. Tan, Y. Wang, D. Zhang Published: IEEE Transactions on Image Processing, Vol. 13, No. 6 Critique By: Christopher

More information

Edge and local feature detection - 2. Importance of edge detection in computer vision

Edge and local feature detection - 2. Importance of edge detection in computer vision Edge and local feature detection Gradient based edge detection Edge detection by function fitting Second derivative edge detectors Edge linking and the construction of the chain graph Edge and local feature

More information

Biomedical Image Analysis. Mathematical Morphology

Biomedical Image Analysis. Mathematical Morphology Biomedical Image Analysis Mathematical Morphology Contents: Foundation of Mathematical Morphology Structuring Elements Applications BMIA 15 V. Roth & P. Cattin 265 Foundations of Mathematical Morphology

More information

ECG782: Multidimensional Digital Signal Processing

ECG782: Multidimensional Digital Signal Processing Professor Brendan Morris, SEB 3216, brendan.morris@unlv.edu ECG782: Multidimensional Digital Signal Processing Spatial Domain Filtering http://www.ee.unlv.edu/~b1morris/ecg782/ 2 Outline Background Intensity

More information

C E N T E R A T H O U S T O N S C H O O L of H E A L T H I N F O R M A T I O N S C I E N C E S. Image Operations II

C E N T E R A T H O U S T O N S C H O O L of H E A L T H I N F O R M A T I O N S C I E N C E S. Image Operations II T H E U N I V E R S I T Y of T E X A S H E A L T H S C I E N C E C E N T E R A T H O U S T O N S C H O O L of H E A L T H I N F O R M A T I O N S C I E N C E S Image Operations II For students of HI 5323

More information

Lecture 4: Spatial Domain Transformations

Lecture 4: Spatial Domain Transformations # Lecture 4: Spatial Domain Transformations Saad J Bedros sbedros@umn.edu Reminder 2 nd Quiz on the manipulator Part is this Fri, April 7 205, :5 AM to :0 PM Open Book, Open Notes, Focus on the material

More information

Digital Image Processing. Prof. P. K. Biswas. Department of Electronic & Electrical Communication Engineering

Digital Image Processing. Prof. P. K. Biswas. Department of Electronic & Electrical Communication Engineering Digital Image Processing Prof. P. K. Biswas Department of Electronic & Electrical Communication Engineering Indian Institute of Technology, Kharagpur Lecture - 21 Image Enhancement Frequency Domain Processing

More information

SECTION 5 IMAGE PROCESSING 2

SECTION 5 IMAGE PROCESSING 2 SECTION 5 IMAGE PROCESSING 2 5.1 Resampling 3 5.1.1 Image Interpolation Comparison 3 5.2 Convolution 3 5.3 Smoothing Filters 3 5.3.1 Mean Filter 3 5.3.2 Median Filter 4 5.3.3 Pseudomedian Filter 6 5.3.4

More information

OCR For Handwritten Marathi Script

OCR For Handwritten Marathi Script International Journal of Scientific & Engineering Research Volume 3, Issue 8, August-2012 1 OCR For Handwritten Marathi Script Mrs.Vinaya. S. Tapkir 1, Mrs.Sushma.D.Shelke 2 1 Maharashtra Academy Of Engineering,

More information

EECS 556 Image Processing W 09. Image enhancement. Smoothing and noise removal Sharpening filters

EECS 556 Image Processing W 09. Image enhancement. Smoothing and noise removal Sharpening filters EECS 556 Image Processing W 09 Image enhancement Smoothing and noise removal Sharpening filters What is image processing? Image processing is the application of 2D signal processing methods to images Image

More information

An Efficient Character Segmentation Based on VNP Algorithm

An Efficient Character Segmentation Based on VNP Algorithm Research Journal of Applied Sciences, Engineering and Technology 4(24): 5438-5442, 2012 ISSN: 2040-7467 Maxwell Scientific organization, 2012 Submitted: March 18, 2012 Accepted: April 14, 2012 Published:

More information

Lecture 7: Most Common Edge Detectors

Lecture 7: Most Common Edge Detectors #1 Lecture 7: Most Common Edge Detectors Saad Bedros sbedros@umn.edu Edge Detection Goal: Identify sudden changes (discontinuities) in an image Intuitively, most semantic and shape information from the

More information

Lecture 6: Edge Detection

Lecture 6: Edge Detection #1 Lecture 6: Edge Detection Saad J Bedros sbedros@umn.edu Review From Last Lecture Options for Image Representation Introduced the concept of different representation or transformation Fourier Transform

More information

Problem definition Image acquisition Image segmentation Connected component analysis. Machine vision systems - 1

Problem definition Image acquisition Image segmentation Connected component analysis. Machine vision systems - 1 Machine vision systems Problem definition Image acquisition Image segmentation Connected component analysis Machine vision systems - 1 Problem definition Design a vision system to see a flat world Page

More information

SEVERAL METHODS OF FEATURE EXTRACTION TO HELP IN OPTICAL CHARACTER RECOGNITION

SEVERAL METHODS OF FEATURE EXTRACTION TO HELP IN OPTICAL CHARACTER RECOGNITION SEVERAL METHODS OF FEATURE EXTRACTION TO HELP IN OPTICAL CHARACTER RECOGNITION Binod Kumar Prasad * * Bengal College of Engineering and Technology, Durgapur, W.B., India. Rajdeep Kundu 2 2 Bengal College

More information

ECG782: Multidimensional Digital Signal Processing

ECG782: Multidimensional Digital Signal Processing Professor Brendan Morris, SEB 3216, brendan.morris@unlv.edu ECG782: Multidimensional Digital Signal Processing Spring 2014 TTh 14:30-15:45 CBC C313 Lecture 03 Image Processing Basics 13/01/28 http://www.ee.unlv.edu/~b1morris/ecg782/

More information

Handwritten Script Recognition at Block Level

Handwritten Script Recognition at Block Level Chapter 4 Handwritten Script Recognition at Block Level -------------------------------------------------------------------------------------------------------------------------- Optical character recognition

More information

HCR Using K-Means Clustering Algorithm

HCR Using K-Means Clustering Algorithm HCR Using K-Means Clustering Algorithm Meha Mathur 1, Anil Saroliya 2 Amity School of Engineering & Technology Amity University Rajasthan, India Abstract: Hindi is a national language of India, there are

More information

Recognition of Unconstrained Malayalam Handwritten Numeral

Recognition of Unconstrained Malayalam Handwritten Numeral Recognition of Unconstrained Malayalam Handwritten Numeral U. Pal, S. Kundu, Y. Ali, H. Islam and N. Tripathy C VPR Unit, Indian Statistical Institute, Kolkata-108, India Email: umapada@isical.ac.in Abstract

More information

HANDWRITTEN GURMUKHI CHARACTER RECOGNITION USING WAVELET TRANSFORMS

HANDWRITTEN GURMUKHI CHARACTER RECOGNITION USING WAVELET TRANSFORMS International Journal of Electronics, Communication & Instrumentation Engineering Research and Development (IJECIERD) ISSN 2249-684X Vol.2, Issue 3 Sep 2012 27-37 TJPRC Pvt. Ltd., HANDWRITTEN GURMUKHI

More information

Image Processing. Bilkent University. CS554 Computer Vision Pinar Duygulu

Image Processing. Bilkent University. CS554 Computer Vision Pinar Duygulu Image Processing CS 554 Computer Vision Pinar Duygulu Bilkent University Today Image Formation Point and Blob Processing Binary Image Processing Readings: Gonzalez & Woods, Ch. 3 Slides are adapted from

More information

Image Enhancement Techniques for Fingerprint Identification

Image Enhancement Techniques for Fingerprint Identification March 2013 1 Image Enhancement Techniques for Fingerprint Identification Pankaj Deshmukh, Siraj Pathan, Riyaz Pathan Abstract The aim of this paper is to propose a new method in fingerprint enhancement

More information

Building Multi Script OCR for Brahmi Scripts: Selection of Efficient Features

Building Multi Script OCR for Brahmi Scripts: Selection of Efficient Features Building Multi Script OCR for Brahmi Scripts: Selection of Efficient Features Md. Abul Hasnat Center for Research on Bangla Language Processing (CRBLP) Center for Research on Bangla Language Processing

More information

Region-based Segmentation

Region-based Segmentation Region-based Segmentation Image Segmentation Group similar components (such as, pixels in an image, image frames in a video) to obtain a compact representation. Applications: Finding tumors, veins, etc.

More information

(Refer Slide Time 00:17) Welcome to the course on Digital Image Processing. (Refer Slide Time 00:22)

(Refer Slide Time 00:17) Welcome to the course on Digital Image Processing. (Refer Slide Time 00:22) Digital Image Processing Prof. P. K. Biswas Department of Electronics and Electrical Communications Engineering Indian Institute of Technology, Kharagpur Module Number 01 Lecture Number 02 Application

More information

Final Review. Image Processing CSE 166 Lecture 18

Final Review. Image Processing CSE 166 Lecture 18 Final Review Image Processing CSE 166 Lecture 18 Topics covered Basis vectors Matrix based transforms Wavelet transform Image compression Image watermarking Morphological image processing Segmentation

More information

N.Priya. Keywords Compass mask, Threshold, Morphological Operators, Statistical Measures, Text extraction

N.Priya. Keywords Compass mask, Threshold, Morphological Operators, Statistical Measures, Text extraction Volume, Issue 8, August ISSN: 77 8X International Journal of Advanced Research in Computer Science and Software Engineering Research Paper Available online at: www.ijarcsse.com A Combined Edge-Based Text

More information

CoE4TN4 Image Processing

CoE4TN4 Image Processing CoE4TN4 Image Processing Chapter 11 Image Representation & Description Image Representation & Description After an image is segmented into regions, the regions are represented and described in a form suitable

More information

Binary Image Processing. Introduction to Computer Vision CSE 152 Lecture 5

Binary Image Processing. Introduction to Computer Vision CSE 152 Lecture 5 Binary Image Processing CSE 152 Lecture 5 Announcements Homework 2 is due Apr 25, 11:59 PM Reading: Szeliski, Chapter 3 Image processing, Section 3.3 More neighborhood operators Binary System Summary 1.

More information

Keywords: Thresholding, Morphological operations, Image filtering, Adaptive histogram equalization, Ceramic tile.

Keywords: Thresholding, Morphological operations, Image filtering, Adaptive histogram equalization, Ceramic tile. Volume 3, Issue 7, July 2013 ISSN: 2277 128X International Journal of Advanced Research in Computer Science and Software Engineering Research Paper Available online at: www.ijarcsse.com Blobs and Cracks

More information

Vivekananda. Collegee of Engineering & Technology. Question and Answers on 10CS762 /10IS762 UNIT- 5 : IMAGE ENHANCEMENT.

Vivekananda. Collegee of Engineering & Technology. Question and Answers on 10CS762 /10IS762 UNIT- 5 : IMAGE ENHANCEMENT. Vivekananda Collegee of Engineering & Technology Question and Answers on 10CS762 /10IS762 UNIT- 5 : IMAGE ENHANCEMENT Dept. Prepared by Harivinod N Assistant Professor, of Computer Science and Engineering,

More information

Development of an Automated Fingerprint Verification System

Development of an Automated Fingerprint Verification System Development of an Automated Development of an Automated Fingerprint Verification System Fingerprint Verification System Martin Saveski 18 May 2010 Introduction Biometrics the use of distinctive anatomical

More information

Artifacts and Textured Region Detection

Artifacts and Textured Region Detection Artifacts and Textured Region Detection 1 Vishal Bangard ECE 738 - Spring 2003 I. INTRODUCTION A lot of transformations, when applied to images, lead to the development of various artifacts in them. In

More information

Structural Feature Extraction to recognize some of the Offline Isolated Handwritten Gujarati Characters using Decision Tree Classifier

Structural Feature Extraction to recognize some of the Offline Isolated Handwritten Gujarati Characters using Decision Tree Classifier Structural Feature Extraction to recognize some of the Offline Isolated Handwritten Gujarati Characters using Decision Tree Classifier Hetal R. Thaker Atmiya Institute of Technology & science, Kalawad

More information

CHAPTER 6 DETECTION OF MASS USING NOVEL SEGMENTATION, GLCM AND NEURAL NETWORKS

CHAPTER 6 DETECTION OF MASS USING NOVEL SEGMENTATION, GLCM AND NEURAL NETWORKS 130 CHAPTER 6 DETECTION OF MASS USING NOVEL SEGMENTATION, GLCM AND NEURAL NETWORKS A mass is defined as a space-occupying lesion seen in more than one projection and it is described by its shapes and margin

More information

Segmentation and Grouping

Segmentation and Grouping Segmentation and Grouping How and what do we see? Fundamental Problems ' Focus of attention, or grouping ' What subsets of pixels do we consider as possible objects? ' All connected subsets? ' Representation

More information

Segmentation of Kannada Handwritten Characters and Recognition Using Twelve Directional Feature Extraction Techniques

Segmentation of Kannada Handwritten Characters and Recognition Using Twelve Directional Feature Extraction Techniques Segmentation of Kannada Handwritten Characters and Recognition Using Twelve Directional Feature Extraction Techniques 1 Lohitha B.J, 2 Y.C Kiran 1 M.Tech. Student Dept. of ISE, Dayananda Sagar College

More information

Topic 6 Representation and Description

Topic 6 Representation and Description Topic 6 Representation and Description Background Segmentation divides the image into regions Each region should be represented and described in a form suitable for further processing/decision-making Representation

More information

COMPUTER AND ROBOT VISION

COMPUTER AND ROBOT VISION VOLUME COMPUTER AND ROBOT VISION Robert M. Haralick University of Washington Linda G. Shapiro University of Washington A^ ADDISON-WESLEY PUBLISHING COMPANY Reading, Massachusetts Menlo Park, California

More information

Digital Image Processing Fundamentals

Digital Image Processing Fundamentals Ioannis Pitas Digital Image Processing Fundamentals Chapter 7 Shape Description Answers to the Chapter Questions Thessaloniki 1998 Chapter 7: Shape description 7.1 Introduction 1. Why is invariance to

More information

Image Processing: Final Exam November 10, :30 10:30

Image Processing: Final Exam November 10, :30 10:30 Image Processing: Final Exam November 10, 2017-8:30 10:30 Student name: Student number: Put your name and student number on all of the papers you hand in (if you take out the staple). There are always

More information

ECE 172A: Introduction to Intelligent Systems: Machine Vision, Fall Midterm Examination

ECE 172A: Introduction to Intelligent Systems: Machine Vision, Fall Midterm Examination ECE 172A: Introduction to Intelligent Systems: Machine Vision, Fall 2008 October 29, 2008 Notes: Midterm Examination This is a closed book and closed notes examination. Please be precise and to the point.

More information

Mathematical Morphology and Distance Transforms. Robin Strand

Mathematical Morphology and Distance Transforms. Robin Strand Mathematical Morphology and Distance Transforms Robin Strand robin.strand@it.uu.se Morphology Form and structure Mathematical framework used for: Pre-processing Noise filtering, shape simplification,...

More information

CHAPTER 3 IMAGE ENHANCEMENT IN THE SPATIAL DOMAIN

CHAPTER 3 IMAGE ENHANCEMENT IN THE SPATIAL DOMAIN CHAPTER 3 IMAGE ENHANCEMENT IN THE SPATIAL DOMAIN CHAPTER 3: IMAGE ENHANCEMENT IN THE SPATIAL DOMAIN Principal objective: to process an image so that the result is more suitable than the original image

More information

SKEW DETECTION AND CORRECTION

SKEW DETECTION AND CORRECTION CHAPTER 3 SKEW DETECTION AND CORRECTION When the documents are scanned through high speed scanners, some amount of tilt is unavoidable either due to manual feed or auto feed. The tilt angle induced during

More information

K S Prasanna Kumar et al,int.j.computer Techology & Applications,Vol 3 (1),

K S Prasanna Kumar et al,int.j.computer Techology & Applications,Vol 3 (1), Optical Character Recognition (OCR) for Kannada numerals using Left Bottom 1/4 th segment minimum features extraction K.S. Prasanna Kumar Research Scholar, JJT University, Jhunjhunu, Rajasthan, India prasannakumarks@acharya.ac.in

More information

Digital Image Processing. Prof. P.K. Biswas. Department of Electronics & Electrical Communication Engineering

Digital Image Processing. Prof. P.K. Biswas. Department of Electronics & Electrical Communication Engineering Digital Image Processing Prof. P.K. Biswas Department of Electronics & Electrical Communication Engineering Indian Institute of Technology, Kharagpur Image Segmentation - III Lecture - 31 Hello, welcome

More information

Fundamentals of Digital Image Processing

Fundamentals of Digital Image Processing \L\.6 Gw.i Fundamentals of Digital Image Processing A Practical Approach with Examples in Matlab Chris Solomon School of Physical Sciences, University of Kent, Canterbury, UK Toby Breckon School of Engineering,

More information

Segmentation of Characters of Devanagari Script Documents

Segmentation of Characters of Devanagari Script Documents WWJMRD 2017; 3(11): 253-257 www.wwjmrd.com International Journal Peer Reviewed Journal Refereed Journal Indexed Journal UGC Approved Journal Impact Factor MJIF: 4.25 e-issn: 2454-6615 Manpreet Kaur Research

More information

2D Image Processing INFORMATIK. Kaiserlautern University. DFKI Deutsches Forschungszentrum für Künstliche Intelligenz

2D Image Processing INFORMATIK. Kaiserlautern University.   DFKI Deutsches Forschungszentrum für Künstliche Intelligenz 2D Image Processing - Filtering Prof. Didier Stricker Kaiserlautern University http://ags.cs.uni-kl.de/ DFKI Deutsches Forschungszentrum für Künstliche Intelligenz http://av.dfki.de 1 What is image filtering?

More information

Chapter 3: Intensity Transformations and Spatial Filtering

Chapter 3: Intensity Transformations and Spatial Filtering Chapter 3: Intensity Transformations and Spatial Filtering 3.1 Background 3.2 Some basic intensity transformation functions 3.3 Histogram processing 3.4 Fundamentals of spatial filtering 3.5 Smoothing

More information

Filtering and Enhancing Images

Filtering and Enhancing Images KECE471 Computer Vision Filtering and Enhancing Images Chang-Su Kim Chapter 5, Computer Vision by Shapiro and Stockman Note: Some figures and contents in the lecture notes of Dr. Stockman are used partly.

More information

Chapter Review of HCR

Chapter Review of HCR Chapter 3 [3]Literature Review The survey of literature on character recognition showed that some of the researchers have worked based on application requirements like postal code identification [118],

More information

Edges and Binary Images

Edges and Binary Images CS 699: Intro to Computer Vision Edges and Binary Images Prof. Adriana Kovashka University of Pittsburgh September 5, 205 Plan for today Edge detection Binary image analysis Homework Due on 9/22, :59pm

More information

Computer Vision I - Basics of Image Processing Part 2

Computer Vision I - Basics of Image Processing Part 2 Computer Vision I - Basics of Image Processing Part 2 Carsten Rother 07/11/2014 Computer Vision I: Basics of Image Processing Roadmap: Basics of Digital Image Processing Computer Vision I: Basics of Image

More information

Image Analysis Image Segmentation (Basic Methods)

Image Analysis Image Segmentation (Basic Methods) Image Analysis Image Segmentation (Basic Methods) Christophoros Nikou cnikou@cs.uoi.gr Images taken from: R. Gonzalez and R. Woods. Digital Image Processing, Prentice Hall, 2008. Computer Vision course

More information

One Dim~nsional Representation Of Two Dimensional Information For HMM Based Handwritten Recognition

One Dim~nsional Representation Of Two Dimensional Information For HMM Based Handwritten Recognition One Dim~nsional Representation Of Two Dimensional Information For HMM Based Handwritten Recognition Nafiz Arica Dept. of Computer Engineering, Middle East Technical University, Ankara,Turkey nafiz@ceng.metu.edu.

More information

Hidden Loop Recovery for Handwriting Recognition

Hidden Loop Recovery for Handwriting Recognition Hidden Loop Recovery for Handwriting Recognition David Doermann Institute of Advanced Computer Studies, University of Maryland, College Park, USA E-mail: doermann@cfar.umd.edu Nathan Intrator School of

More information

Segmentation Based Optical Character Recognition for Handwritten Marathi characters

Segmentation Based Optical Character Recognition for Handwritten Marathi characters Segmentation Based Optical Character Recognition for Handwritten Marathi characters Madhav Vaidya 1, Yashwant Joshi 2,Milind Bhalerao 3 Department of Information Technology 1 Department of Electronics

More information

Texture. Frequency Descriptors. Frequency Descriptors. Frequency Descriptors. Frequency Descriptors. Frequency Descriptors

Texture. Frequency Descriptors. Frequency Descriptors. Frequency Descriptors. Frequency Descriptors. Frequency Descriptors Texture The most fundamental question is: How can we measure texture, i.e., how can we quantitatively distinguish between different textures? Of course it is not enough to look at the intensity of individual

More information

Perception. Autonomous Mobile Robots. Sensors Vision Uncertainties, Line extraction from laser scans. Autonomous Systems Lab. Zürich.

Perception. Autonomous Mobile Robots. Sensors Vision Uncertainties, Line extraction from laser scans. Autonomous Systems Lab. Zürich. Autonomous Mobile Robots Localization "Position" Global Map Cognition Environment Model Local Map Path Perception Real World Environment Motion Control Perception Sensors Vision Uncertainties, Line extraction

More information

Computer Vision 2. SS 18 Dr. Benjamin Guthier Professur für Bildverarbeitung. Computer Vision 2 Dr. Benjamin Guthier

Computer Vision 2. SS 18 Dr. Benjamin Guthier Professur für Bildverarbeitung. Computer Vision 2 Dr. Benjamin Guthier Computer Vision 2 SS 18 Dr. Benjamin Guthier Professur für Bildverarbeitung Computer Vision 2 Dr. Benjamin Guthier 1. IMAGE PROCESSING Computer Vision 2 Dr. Benjamin Guthier Content of this Chapter Non-linear

More information

Lecture 4 Image Enhancement in Spatial Domain

Lecture 4 Image Enhancement in Spatial Domain Digital Image Processing Lecture 4 Image Enhancement in Spatial Domain Fall 2010 2 domains Spatial Domain : (image plane) Techniques are based on direct manipulation of pixels in an image Frequency Domain

More information

Image representation. 1. Introduction

Image representation. 1. Introduction Image representation Introduction Representation schemes Chain codes Polygonal approximations The skeleton of a region Boundary descriptors Some simple descriptors Shape numbers Fourier descriptors Moments

More information

Handwritten Gurumukhi Character Recognition by using Recurrent Neural Network

Handwritten Gurumukhi Character Recognition by using Recurrent Neural Network 139 Handwritten Gurumukhi Character Recognition by using Recurrent Neural Network Harmit Kaur 1, Simpel Rani 2 1 M. Tech. Research Scholar (Department of Computer Science & Engineering), Yadavindra College

More information

CHAPTER 8 COMPOUND CHARACTER RECOGNITION USING VARIOUS MODELS

CHAPTER 8 COMPOUND CHARACTER RECOGNITION USING VARIOUS MODELS CHAPTER 8 COMPOUND CHARACTER RECOGNITION USING VARIOUS MODELS 8.1 Introduction The recognition systems developed so far were for simple characters comprising of consonants and vowels. But there is one

More information

An Intuitive Explanation of Fourier Theory

An Intuitive Explanation of Fourier Theory An Intuitive Explanation of Fourier Theory Steven Lehar slehar@cns.bu.edu Fourier theory is pretty complicated mathematically. But there are some beautifully simple holistic concepts behind Fourier theory

More information

Morphological Image Processing

Morphological Image Processing Morphological Image Processing Ranga Rodrigo October 9, 29 Outline Contents Preliminaries 2 Dilation and Erosion 3 2. Dilation.............................................. 3 2.2 Erosion..............................................

More information

Cursive Handwriting Recognition System Using Feature Extraction and Artificial Neural Network

Cursive Handwriting Recognition System Using Feature Extraction and Artificial Neural Network Cursive Handwriting Recognition System Using Feature Extraction and Artificial Neural Network Utkarsh Dwivedi 1, Pranjal Rajput 2, Manish Kumar Sharma 3 1UG Scholar, Dept. of CSE, GCET, Greater Noida,

More information

Text line Segmentation of Curved Document Images

Text line Segmentation of Curved Document Images RESEARCH ARTICLE S OPEN ACCESS Text line Segmentation of Curved Document Images Anusree.M *, Dhanya.M.Dhanalakshmy ** * (Department of Computer Science, Amrita Vishwa Vidhyapeetham, Coimbatore -641 11)

More information

A New Algorithm for Detecting Text Line in Handwritten Documents

A New Algorithm for Detecting Text Line in Handwritten Documents A New Algorithm for Detecting Text Line in Handwritten Documents Yi Li 1, Yefeng Zheng 2, David Doermann 1, and Stefan Jaeger 1 1 Laboratory for Language and Media Processing Institute for Advanced Computer

More information

MR IMAGE SEGMENTATION

MR IMAGE SEGMENTATION MR IMAGE SEGMENTATION Prepared by : Monil Shah What is Segmentation? Partitioning a region or regions of interest in images such that each region corresponds to one or more anatomic structures Classification

More information

Chapter 10: Image Segmentation. Office room : 841

Chapter 10: Image Segmentation.   Office room : 841 Chapter 10: Image Segmentation Lecturer: Jianbing Shen Email : shenjianbing@bit.edu.cn Office room : 841 http://cs.bit.edu.cn/shenjianbing cn/shenjianbing Contents Definition and methods classification

More information

Logical Templates for Feature Extraction in Fingerprint Images

Logical Templates for Feature Extraction in Fingerprint Images Logical Templates for Feature Extraction in Fingerprint Images Bir Bhanu, Michael Boshra and Xuejun Tan Center for Research in Intelligent Systems University of Califomia, Riverside, CA 9252 1, USA Email:

More information

Slant Correction using Histograms

Slant Correction using Histograms Slant Correction using Histograms Frank de Zeeuw Bachelor s Thesis in Artificial Intelligence Supervised by Axel Brink & Tijn van der Zant July 12, 2006 Abstract Slant is one of the characteristics that

More information

Lecture: Segmentation I FMAN30: Medical Image Analysis. Anders Heyden

Lecture: Segmentation I FMAN30: Medical Image Analysis. Anders Heyden Lecture: Segmentation I FMAN30: Medical Image Analysis Anders Heyden 2017-11-13 Content What is segmentation? Motivation Segmentation methods Contour-based Voxel/pixel-based Discussion What is segmentation?

More information

11. Gray-Scale Morphology. Computer Engineering, i Sejong University. Dongil Han

11. Gray-Scale Morphology. Computer Engineering, i Sejong University. Dongil Han Computer Vision 11. Gray-Scale Morphology Computer Engineering, i Sejong University i Dongil Han Introduction Methematical morphology represents image objects as sets in a Euclidean space by Serra [1982],

More information

Vision. OCR and OCV Application Guide OCR and OCV Application Guide 1/14

Vision. OCR and OCV Application Guide OCR and OCV Application Guide 1/14 Vision OCR and OCV Application Guide 1.00 OCR and OCV Application Guide 1/14 General considerations on OCR Encoded information into text and codes can be automatically extracted through a 2D imager device.

More information

SIFT - scale-invariant feature transform Konrad Schindler

SIFT - scale-invariant feature transform Konrad Schindler SIFT - scale-invariant feature transform Konrad Schindler Institute of Geodesy and Photogrammetry Invariant interest points Goal match points between images with very different scale, orientation, projective

More information

Efficient Nonlinear Image Processing Algorithms

Efficient Nonlinear Image Processing Algorithms Efficient Nonlinear Image Processing Algorithms SANJIT K. MITRA Department of Electrical & Computer Engineering University of California Santa Barbara, California Outline Introduction Quadratic Volterra

More information

SCENE TEXT BINARIZATION AND RECOGNITION

SCENE TEXT BINARIZATION AND RECOGNITION Chapter 5 SCENE TEXT BINARIZATION AND RECOGNITION 5.1 BACKGROUND In the previous chapter, detection of text lines from scene images using run length based method and also elimination of false positives

More information

Image Enhancement: To improve the quality of images

Image Enhancement: To improve the quality of images Image Enhancement: To improve the quality of images Examples: Noise reduction (to improve SNR or subjective quality) Change contrast, brightness, color etc. Image smoothing Image sharpening Modify image

More information

LITERATURE REVIEW. For Indian languages most of research work is performed firstly on Devnagari script and secondly on Bangla script.

LITERATURE REVIEW. For Indian languages most of research work is performed firstly on Devnagari script and secondly on Bangla script. LITERATURE REVIEW For Indian languages most of research work is performed firstly on Devnagari script and secondly on Bangla script. The study of recognition for handwritten Devanagari compound character

More information

Outline 7/2/201011/6/

Outline 7/2/201011/6/ Outline Pattern recognition in computer vision Background on the development of SIFT SIFT algorithm and some of its variations Computational considerations (SURF) Potential improvement Summary 01 2 Pattern

More information

Time Stamp Detection and Recognition in Video Frames

Time Stamp Detection and Recognition in Video Frames Time Stamp Detection and Recognition in Video Frames Nongluk Covavisaruch and Chetsada Saengpanit Department of Computer Engineering, Chulalongkorn University, Bangkok 10330, Thailand E-mail: nongluk.c@chula.ac.th

More information