CHAPTER 3 SYSTEM DESCRIPTION

Size: px

Start display at page:

Download "CHAPTER 3 SYSTEM DESCRIPTION"

Archibald Hudson
5 years ago
Views:

1 SYSTEM OVERVIEW Figure 3.1 Overview of the System Figure 3.1 above depicts the overview of the system.

1 39 CHAPTER 3 SYSTEM DESCRIPTION This chapter exhibits the overview of the system with specifications. It also furnishes the purpose of using the untapped descriptive statistics measures and detailed description of the system with flowcharts, procedures and samples. 3.1 SYSTEM OVERVIEW Figure 3.1 Overview of the System Figure 3.1 above depicts the overview of the system. The Descriptive Statistics Based Comprehensive Segmentation System for Unconstrained Handwritten Text was developed by using Java Developer Kit (JDK) 1.6. Preprocessing was done by using ImageJ 1.45m tool. It works in Microsoft Windows operating system with Intel Pentium Dual-Core processor,

2 40 1.6GHz speed and 1GB RAM. The system accepts document image files in 256 grey levels as input and produces character image file as output in JPEG format. MATLAB was used for writing scripts. The system has used the descriptive statistics measures of central tendency - arithmetic mean, inter quartile mean and trimmed mean for thresholding. It has four modules - preprocessing module, line segmentation module, word segmentation module and character segmentation module which are integrated one after another. These modules are capable of running independently also to receive appropriate input and produce output at any stage. The steps followed in each module are briefly described here Preprocessing Module Preprocessing improves the quality of digital images to achieve improved accuracy of segmentation and recognition. It concerns enhancing some features and eliminating some inconsistencies in the image. This preprocessing module has four sub-modules for skew angle correction, grayscale conversion, background subtraction and binarisation. ImageJ 1.45m tool is used for preprocessing the image in the research. A text image is obtained from the handwritten document image database. Projection profile method is used for estimation of the document skew and it is corrected using rotate function in ImageJ tool. The skew-corrected document image is converted to grayscale image. Using rolling ball algorithm, Subtract Background command of the ImageJ tool removes smooth continuous background and corrects uneven illuminated background from the image. The Otsu s method of binarisation calculates the optimum threshold value for converting the grayscale image into binary image.

3 Line Segmentation Module The preprocessed image is subjected to line segmentation in this module. This module constructs the horizontal projection profile of the document image. Projection profile method under top down approach is followed. Projection profile analysis is based on the identification of the minima in the horizontal projection profile. The type of text lines as well separated, sharing, touching and overlapping is identified first. Well separated lines are segmented at minima of the horizontal projection profile. Sharing lines are segmented by core detection, separator line fixing and line boundary setting consecutively. Touching and overlapping lines are segmented by using thinning and subtraction techniques. Lines with irregular baselines are segmented, at which, the distance from top to the first black pixel hit is less than the threshold value. Short lines are segmented as sharing lines after identifying the core boundaries Word Segmentation Module The segmented lines are input for this module. Vertical projection profile of the line image is constructed. The word segmentation begins with finding the gap width (g w ) between the components. A component can be a word or part of a word. The number of gaps (NG) between the components and total gap width (GW) are calculated. The gaps between adjacent components are classified as gap between words (inter-word gap), when the gap is greater than the threshold and gap within word (intra-word gap) otherwise. And lines are segmented into words at inter-word gaps Based on the gap width (g w ), the threshold value is decided as arithmetic mean, inter quartile mean or trimmed mean. If the distance between characters is less than the distance between words, arithmetic mean is used as threshold value. If the distance between characters is greater than or equal to the distance between words, inter quartile mean is used as threshold

4 42 value. In case of overlapping components, the segmentation is restricted to the core region and trimmed mean is used as threshold value for core boundary prior to gap classification Character Segmentation Module In character segmentation module, the image obtained from word segmentation module is decomposed into sub images of individual symbol or character. Classical or dissection method of character segmentation is employed in the module. Extraction of transition features along with stroke height estimation identifies loop, valley and stem / hook type characters. Detection and classification of ligature into inter-letter link or intra-letter link lead to generate possible segmentation points. For segmenting shadow characters, the trimmed mean of pixel distribution is used as the threshold value. For characters with touching boundaries, modified histogram is constructed for the characters. For broken characters, the valid segmentation points are filled by black pixels by dilation technique. 3.2 DESCRIPTIVE STATISTICS THE UNTAPPED Arithmetic Mean Arithmetic Mean (AM) is the average of a set of data samples, calculated by using the formula in Equation (3.1). (3.1) where x is value of data sample and n is the number of samples.

5 43 AM would be the suitable estimate of central tendency, when data are normal. When data are non-normal, eliminating extreme cases of samples would result in a better estimate of central tendency. Median is the midrange of a set of data samples, which is the AM of the minimum and maximum values in the set. It is highly sensitive to outliers (sample values that cause surprise with majority of the samples) and ignores all but the lowest and highest values. It is therefore very non-robust, rarely used in statistical analysis. For detecting the core region exactly, the ascender and descender have to be neglected as outliers. They could confuse the standard statistical methods Inter Quartile Mean Inter Quartile Mean (IQM) is the truncated mean discarding lowest 25 percent and highest 25 percent of the dataset after sorting them. It is a robust estimator, which is insensitive to outliers, since they are not used in calculation of IQM. Since extreme values do not influence the estimate, IQM is suitable in highly skewed or erratic distribution. IQM is calculated by using the formula in Equation (3.2). (3.2) Trimmed Mean Trimmed Mean (TM) is the truncated mean discarding lowest t percent and highest t percent of the dataset. TM reduces the effects of outliers on the calculated average. It does not allow abnormal extreme values of the dataset to influence the mean value. It is resistant to gross error.

6 44 For most statistical applications, 5 to 25 percent of the ends are discarded. The K percent TM of N values is computed by using the formula in Equation (3.3) after sorting the values. (3.3) These are the best suitable descriptive statistical measures of central tendency not used as much as it should be for dataset with large, erratic deviations or extremely skewed distributions. These factors motivated the researcher to employ the measures. These measures are used as threshold values appropriately in the modules. 3.3 DETAILED DESCRIPTION Preprocessing Module The preprocessing was done by using ImageJ 1.45m tool. It is a public domain image processing and analysis program. ImageJ runs either as an online applet or as a downloadable application on any computer with a Java 1.5 or later virtual machine. Downloadable distributions are available for Windows, Mac OSX and Linux. It can display, edit, analyze, process, save and print 8 bit, 16 bit and 32 bit images. It can read many image formats including TIFF, GIF, JPEG, BMP, DICOM, FITS and raw ( The following steps were carried out in the preprocessing module of the system. 1. An unconstrained handwritten text image was obtained from a dataset. 2. The skew of the image was detected by projection profile method and corrected by rotate function of ImageJ tool.

7 45 3. In case of colored image, it was converted as grayscale image using the formula (0.3 x R) + (0.6 x G) + (0.1 x B) to reduce the original intensity values of the image to fewer bits for lower complexity (R - Red, G - Green and B - Blue scale). 4. Background subtraction was carried out to reduce the noise in the image and the data to be retained by using Rolling Ball algorithm in ImageJ tool with local thresholding. 5. The grayscale image was converted as binary image using Otsu s method with local threshold algorithm, then by global threshold algorithm, where the pixels with intensity value greater than the threshold value were set to 0 (black) and the rest were set to 1 (white). These steps are explained below in detail with a flowchart (Figure 3.2). Figure 3.2 Flowchart for Preprocessing

8 46 Figure 3.3 Sample Input Image Skew Detection and Correction Projection profile method is a popular skew estimation technique used in DIA. It is well suited for skew angle from +10 to -10 (Chaudhuri and Pal 1997). Since the document images would have maximum of 10 skew, this method was selected to use in the system. To determine the skew of the document, the horizontal projection profile was computed at a number of angles and for each angle, a measure of difference of peak and trough height was

9 47 made. The maximum difference corresponds to the best alignment with the text line direction determined the skew angle. Then the document image was rotated in reverse direction by the estimated angle for skew correction using rotate function of ImageJ tool. The Figure 3.4 shows the image corrected by 2 skew angle estimated in a sample image from the researcher s dataset (Figure 3.3). Figure 3.4 Output After Skew Correction

10 Grayscale Conversion The shown sample, being a colour image, was converted into grayscale image (Figure 3.5) after skew correction. Figure 3.5 Output After Grayscale Conversion

49 3.3.1.3 Background Subtraction Noise removal is one of the steps in preprocessing. Noise removal contains techniques of cleaning, smoothing and enhancing the images.

Morphological techniques like dilation*, erosion* and skeletonisation analyse the images for extracting or modifying the information of an image.

11 Background Subtraction Noise removal is one of the steps in preprocessing. Noise removal contains techniques of cleaning, smoothing and enhancing the images. Noise reduction techniques are categorized into filtering and morphological techniques. The noises could be removed to certain extent using filtering techniques. Morphological techniques like dilation*, erosion* and skeletonisation analyse the images for extracting or modifying the information of an image. Document images suffering from degradations like uneven illumination, image contrast variation, bleeding-through and smear, make the document image binarisation a challenging task. Background subtraction is the process of subtracting the background of the image (Figure 3.6) to eliminate its effect on segmentation. It could be done by subtracting a constant pixel value globally from each pixel value of the image, subtracting a reference background from every image or rolling ball method. Figure 3.6 Background Subtraction Source: Kathryn Leonard, California State University * Erosion makes an object smaller by removing or eroding away the pixels on its edges and dilation makes an object larger by adding pixels around its edges.

50 Figure 3.7 Document Images with Uneven Illumination Background Smooth continuous background was removed and uneven illuminated background (Figure 3.

12 50 Figure 3.7 Document Images with Uneven Illumination Background Smooth continuous background was removed and uneven illuminated background (Figure 3.7) was corrected from images by Subtract Background command (Figure 3.8) of the ImageJ tool based on the rolling ball algorithm developed by Stanley Sternberg (1983). Figure 3.8 Subtract Background of ImageJ Tool

13 51 Grayscale opening is grayscale erosion followed by grayscale dilation and grayscale closing is grayscale dilation followed by grayscale erosion. Grayscale opening is viewed as a ball rolling under the vertical projection profile of the image pushing up on the underside. The result is the surface of the highest points reached by any part of the rolling ball. Grayscale closing is viewed as the ball rolling on the vertical projection profile of the image pressing down on the top. The result is the surface of the lowest points reached by any part of the rolling ball (Figure 3.9). Figure 3.9 Rolling Ball Algorithm Source: Kathryn Leonard, California State University Grayscale opening smoothes the image from above-the-brightness and grayscale closing smoothes it from below-the-brightness. They removed small local maxima or minima without affecting the gray values of larger objects. Bright features smaller than the ball were reduced in intensity, while larger features remained more or less unchanged in intensity. The rolling ball radius should be the size of largest object in the image. The radius of the ball was set as 50 pixels. The ball identified the smoother background surface, which ignored features which were narrower than the radius of the ball. These narrow features were left after subtraction of the background.

14 52 The subtraction is based on the surroundings. The rolling ball processed the image, based on the operations defined by the pixel values of a small surrounding region of the image. This is a local background subtraction method, based on contributions from a defined number of adjacent pixels. Figure 3.9 shows the image after background subtraction. Figure 3.10 Output After Background Subtraction

15 Binarisation Binarisation could be performed by global or local thresholding. A global binarisation gives a single threshold value for the entire image. Its results are good for document images with uniform illumination. Images with non-uniform illumination might suffer from loss, when employing global thresholding methods. In local thresholding, the input image is fragmented into windows and threshold for each window is determined by using simple mean to binarise the gray scale image. Thresholding is a basic operation in image processing. Otsu s method created by Nobuyuki Otsu (1979) was followed in the system to calculate the optimum threshold value for converting the grayscale image into binary image. Otsu s thresholding is a simple and effective global automatic thresholding method. It is used for automatic binarisation decision, based on the shape of the histogram. Its applications are ranging from medical imaging to low level computer vision. It operates on histograms, which are integer or float arrays of length 256. The algorithm is quite faster and holds approximately 80 lines of coding. But it works on uniform illumination images and the histogram should be bimodal. With an even background without noise, binarisation could be done by using Otsu to produce better results. The threshold value t was calculated by the following steps: 1. The pixels were separated into two clusters according to the threshold. 2. The mean of each cluster were found and the difference between the means was squared. 3. It was multiplied by the number of pixels in one cluster and in another cluster.

16 54 The above steps are mathematically represented as below. 1. Let and represent the estimate of class probabilities defined as Equations (3.4) and (3.5). (3.4) represents the image histogram. (3.5) represent the individual class variances defined as Equations (3.6) and (3.7). (3.6) (3.7) 3. Let µ 1 and µ 2 represent the class means defined as Equation (3.8) and (3.9): (3.8) (3.9) 4. The problem of minimizing within-class-variance could be expressed as a maximization problem of the between-class-variance. It could be written as the difference of total variance and within-class-variance as Equation (3.10) 5. This expression could be maximized and the solution is that is maximizing.

17 55 The algorithm assumed that the image contains two classes of pixels - foreground and background. The pixels with value greater than the threshold were set to 0 (black) and the rest were set to 1 (white). Figure 3.11 Otsu s Threshold Value Calculation Source : Otsu (1979) Otsu s method found the binary threshold, which involved iterating through all the possible threshold values and calculating a measure of spread for the pixel levels at each side of the threshold - foreground or background. The aim was to find the threshold value where the sum of foreground and background spreads is at its minimum. Figure 3.12 shows the output after binarisation.

18 56 Figure 3.12 Output After Binarisation Line Segmentation Module The preprocessed image obtained from ImageJ tool was subjected to line segmentation in this module. A novel algorithm (Figure 3.13) was proposed for extracting the text lines from handwritten document images. This algorithm constructed the horizontal projection profile and extracted the core boundary using TM of foreground pixels as distance metric to segment the image into text lines. Global projection profile method under top down approach

19 57 is followed in the module. The type of text lines as well separated, sharing, touching and overlapping was identified first before segmenting them from the document image using the following criterion. If the horizontal projection profile P(i) is equal to 0, they are lines that do not share pixels with neither touching nor overlapping, otherwise they are lines that share pixels with touching or overlapping. Figure 3.13 Flowchart for Line Segmentation

58 The line segmentation algorithm followed in the system is illustrated with a sample image

Well separated lines that do not share pixels with neither touching nor overlapping were

20 58 The line segmentation algorithm followed in the system is illustrated with a sample image (Figure 3.14) taken from the researcher s dataset after preprocessing. Well separated lines that do not share pixels with neither touching nor overlapping were segmented from the document image with delimiting minima point as the parameter. Horizontal projection profile was used in line segmentation. From the horizontal projection profile, gaps between the text lines were found. Lines were delimited by searching for the minima in the projection profile around each maxima. The number of maxima of the profile gave the number of lines. This procedure was repeated till the height of the image. Figure 3.14 Sample Image for Line Segmentation and its Horizontal Projection Profile

21 59 Sharing lines were segmented by core detection, separator line fixing and line boundary setting procedures (Figure 3.15) consecutively. Figure 3.15 Flowchart for Core Detection, Separator Line Fixing and Line Boundary Setting Core Detection Line segmentation begins with detecting the core boundary for each text line in the image. From the constructed horizontal projection profile, core region was plotted as the area from the point, where the pixel distribution is greater than the threshold to the point, where the pixel distribution is less than the threshold (Gomathi alias Rohini et al 2013). Here TM was used as threshold value, since it reduced the effect of outlier in the calculation.

22 60 From the horizontal projection profile, core boundary threshold (CBT) was calculated as TM of foreground pixel values. While scanning from left to right, if the total number of pixels along the row was greater than the core boundary threshold, it was fixed as core upper boundary (CUB), otherwise, it was fixed as core lower boundary (CLB) (Figure 3.16). The range of core boundary fell between 70 and 100 pixels. Figure 3.16 Core Detection in the Sample Image

23 Separator Line Fixing Separator Lines (SL) were estimated by (CLB 1 +CUB 2 )/2, where CLB 1 is the CLB of first line and CUB 2 is the CUB of second line. SLs gave the midpoint between the core regions. SLs were drawn at the center of successive core regions of the adjacent text lines (Figure 3.17). Figure 3.17 Separator Line Fixing in the Sample Image

62 3.3.2.3 Line Boundary Setting Midpoint between SL and CUB 2 was set as line lower boundary of line 1(LLB 1 ) and midpoint between SL and CLB 1 was set as line upper boundary of line 2(LUB 2 ).

24 Line Boundary Setting Midpoint between SL and CUB 2 was set as line lower boundary of line 1(LLB 1 ) and midpoint between SL and CLB 1 was set as line upper boundary of line 2(LUB 2 ). If denser document is segmented, lines would be resulted with noise. Midpoint lines - Mid Upper (MU) and Mid Lower (ML) (Figure 3.18) were drawn between the separator line and core regions (upper baseline or lower baseline of adjacent text lines) CLB 1 and CUB 2 and marked as segmentation points for the text lines. Figure 3.18 Separator Line (solid line) with MU(dashed lines) and ML(dotted lines) Then the image was segmented at mid point line positions (Figure 3.19). Here the height of each component would be equal to the average height of the components.

63 Figure 3.19 Line Boundary Setting in the Sample Image (lower boundary marked as blue dashed line and upper boundary marked as orange solid line) 3.3.2.

25 63 Figure 3.19 Line Boundary Setting in the Sample Image (lower boundary marked as blue dashed line and upper boundary marked as orange solid line) Touching and Overlapping Lines Touching and overlapping components are challenges in text line extractions, since there is no white space left between lines. Some methods like grouping method and Repulsive Attractive Network (RAN) method did not need to detect such components because, they

26 64 extracted only baselines. Some criterion like quality threshold in the stochastic method made the paths avoided crossing the black pixels. Criterion like component size and the fact that the component belongs to several alignments or no alignment could be used for detecting ambiguous components (Kim et al 1999). Fei Yin and Cheng Lin Lue (2007) classified the ambiguous components into three categories: 1. a touching component which had to be decomposed into two or more parts, 2. an overlapping component which belongs to the upper alignment, with respect to lower and 3. an overlapping component which belongs to the lower alignment, with respect to upper. The developed system segmented the touching and overlapping lines (Figure 3.20) using thinning and subtraction techniques. If the height of the component, having intersection with separating line is greater than the average height of the components, it was judged as touching component, otherwise as an overlapping component. (Nallapareddy Priyanka et al 2010). Figure 3.20 Sample Image with Touching and Overlapping Lines If the touching and overlapping lines are segmented as sharing lines, the output would have noise due to touching or overlapping components (Figure 3.21). The overlapping might occur due to component with or without loop. Figure 3.21 Output with Noise

3.22 above shows the touching and overlapping components in the sample image circled in

27 65 Figure 3.22 Touching and Overlapping Components in the Sample Image ( shown by red circle) Figure 3.22 above shows the touching and overlapping components in the sample image circled in red color. The following operations were performed to segment them. 1. The touching and overlapping areas were located (Figure 3.23). Figure 3.23 Locating Overlapping Area

28 66 2. Each component was thinned down to a skeleton of unitary thickness. Parallel thinning technique was used on the touching and overlapping components to capture the structural knowledge first. This morphological operation was done by using ImageJ tool. Extracting the skeleton of a picture consists of removing all the contour points of the picture except those points that belong to the skeleton (Zhang and Suen 1984). The deletion of individual pixel was based on the results of the previous iteration in parallel thinning. It considered a 3x3 neighborhood around the current pixel. First iteration deleted the south-east boundary points and the north-west corner points and Second iteration deleted the north-west boundary points and the south-east corner points. 3. Transitional features between foreground pixel to background pixel and vice-versa (run-length) of the thinned component were captured. The type of component interconnection was determined (Kalyan Takru and Graham Leedham 2002) based on the number of transitional feature. Transitional features of some foreground pixels might be skipped, if their transitional position value does not exceed the threshold value IQM. 4. The overlapping components were removed by eroding in the positions as mentioned in Table 3.1 based on number of transitions and the thinned image was split into separate components.

29 67 No. No. of Transitions Table 3.1 Removal of Overlapping Components Type of Interconnection A descender with a loop touching a vertical ascender or a lower-case letter in the word below A descender with a loop overlapping a vertical ascender Sample Image Erode Position A vertical descender touching lowercase letters or the curving top of an 3,4 ascender or a capital letter A vertical descender meets a vertical ascender 3,6 1,2 3,4 5 8 A descender with loop with a ascender with loop 3,4,5,6 5. Then one of the touching or overlapping components was extracted. The extracted image is subtracted from the thinned image to obtain another component. This procedure is illustrated in Figure 3.24 below. Locating touching and overlapping components Image After Thinning Calculating Run-Length Component-1 Extracted Component-2 Extracted after Subtracting Component-1 Figure 3.24 Segmenting Touching and Overlapping Components

68 3.3.2.5 Irregular Base Lines Figure 3.

30 Irregular Base Lines Figure 3.25 Sample Image with Irregular Baseline The position of the irregular baselines is located by finding the peak in horizontal projection profile only in bottom half of the image. Lines with irregular base (Figure 3.25) were segmented, at which, the distance from top to the first black pixel hit is less than the TM of the distances. TM was used as threshold value in extracting the irregular base lines. To segment the lines having irregular base, the following procedure was followed (Figure 3.27). 1. At the top of the image, a horizontal line was drawn from left to right, where there was no black pixel hit. 2. From left to right of the horizontal line, at regular intervals, going from top to bottom, the distances from top extreme to the first black pixel hit were found (Figure 3.27) and its TM was calculated since it does not allow extreme values to influence the mean value. Figure 3.26 Distance Measurement in Irregular Baseline

31 69 Figure 3.27 Flowchart for Irregular Baseline Segmentation 3. If the distance from the horizontal line to the first black pixel is lesser than the TM, a line was drawn from the top extreme to this point. This is the irregular base line identified. Otherwise this line was skipped. This step was repeated until the right end is reached (Gomathi alias Rohini and Umadevi 2012). The following MATLAB script was used to separate the irregular base lines with different colors (Figure 3.28).

32 70 I = imread('shotlin.jpg'); figure, imshow(i), title('original image') BW = im2bw(i, graythresh(i)); L = bwlabel(bw); RGB = label2rgb(l); RGB2 = label2rgb(l, 'spring', 'c', 'shuffle'); imshow(rgb), figure, imshow(rgb2) Figure 3.28 Seperated Irregular Base Lines Short Lines Figure 3.29 Sample Image with Short Lines Short lines were segmented by using core detection and identifying the core boundaries. TM of the foreground pixel values was used to detect the core boundary and as threshold in spotting the short lines. When projection profile method was employed on images containing short lines, it resulted in under-segmentation. To segment the short text lines (Figure 3.29) from the image, the following procedure was followed.

71 The region of interest was reduced to the area between two core

The core region was detected using core detection procedure by finding

interest and locating the core region (Figure 3.30).

33 71 The region of interest was reduced to the area between two core regions. The core region was detected using core detection procedure by finding the TM (being resistant to the gross error) within the region of interest and locating the core region (Figure 3.30). CLB and CUB were set. Then SL was set and LLB and LUB were fixed. The line with these points was cut. Figure 3.30 Shortline Identification and Reduced Region of Interest

34 72 Figure 3.31 Output After Line Segmentation Thus the image was segmented into lines (Figure 3.31). The segmented lines were stored in a folder as individual JPEG files.

73 The developed line segmentation module incorrectly segmented the lines when a strip of characters were present between the lines (Figure 3.32), called line merge error.

35 73 The developed line segmentation module incorrectly segmented the lines when a strip of characters were present between the lines (Figure 3.32), called line merge error. There were cases found when characters written smaller in size above a line got segmented, which was not supposed to be done (Figure 3.33) called line split error. Figure 3.32 Line Merge Error Figure 3.33 Line Split Error research. This might be due to various reasons, which can be analysed only through further

36 Word Segmentation Module In this module, the following procedure was followed to segment the words into characters (Figure 3.34). Figure 3.34 Flowchart for Word Segmentation

37 75 1. The segmented line images from the line segmentation module are inputs to this module. At first, vertical projection profile of the segmented line image was obtained (Figure 3.35). Figure 3.35 Sample Word Image with its Vertical Projection Profile 2. The image was traced vertically down to find the first hit of a black pixel and segmentation points were decided depending on the row number and column number of these black pixels. From top to bottom of each line, vertical scan was performed from 0 th pixel to height of the image to compute the distance between successive CCs. If no black pixels were encountered, then the scan was denoted by 1, otherwise by 0. This process was continued throughout the width of the image from left end. 3. These scan results were stored in a one dimensional array called distance metric (DM) array, which holds foreground and background information of the image (Figure. 3.36). Figure 3.36 Binary Representation of the Line Image and its DM Array 4. In the DM array, the presence of 0 indicated CC and 1 indicated gap. A CC could be a word or part of a word. The width of each gap (g w ) was calculated as the count of consecutive white runs (Figure 3.37). The variations in gaps (GV) were calculated.

38 76 Number of transitions from 0 to 1 or 1 to 0 in the array was divided by 2 to find the number of gaps (NG). The sum total gap width GW was calculated by using the Equation (3.11). (3.11) Figure 3.37 Gap Width Calculation 5. With this gap metrics g w, NG, GW and GV, the identified gaps between adjacent CCs were classified as inter-word gaps (gap between words) and intra-word gaps (gap within words). Black pixels which are atleast 5 pixels apart from each other were considered as words. A threshold value for each line image was used for the gap classification. Manu Pratap Singh and Dhaka (2009) opined that more intensity lead to higher threshold value. The gaps with background pixel count above the threshold value were inter-word gaps and below the threshold value were intra-word gaps (Figure 3.38). And the line image was segmented at inter-word gaps to get words (Figure 3.39). Figure 3.38 Gap Classification Process Figure 3.39 Segmentation Points

77 6. Based on the gap width (g w ), the threshold value was decided as AM, IQM or TM. It was determined as mentioned below with respect to the cases. a. When the distance between characters was less than that of words (Figure 3.

39 77 6. Based on the gap width (g w ), the threshold value was decided as AM, IQM or TM. It was determined as mentioned below with respect to the cases. a. When the distance between characters was less than that of words (Figure 3.40), where the GV was less than 30 pixels, AM of g w s was used as threshold value, calculated using the Equation (3.12). (3.12) If the gap width, g w was greater than the threshold value T am, then the gap was marked as word boundary; else it was ignored as gaps within the characters. Input Image with DBC < DBW Output Image After Segmentation Figure 3.40 Word Segmentation with AM as Threshold Value b. When the distance between characters was greater than or equal to that of words (Figure 3.41), where the GV was between 30 and 60 pixels, classifying the gaps as inter-word gaps and intra-word gaps would be difficult, if AM of g w s was used as threshold value, due to the big variance in the g w. Input Image with DBC DBW Output Image After Segmentation Figure 3.41 Incorrect Word Segmentation with AM as Threshold Value

78 To overcome this drawback, IQM of g w s was used as threshold value, which sorted the g w s and eliminated the lowest and highest 25 percent of values of g w s to give a normalised mean that suits

40 78 To overcome this drawback, IQM of g w s was used as threshold value, which sorted the g w s and eliminated the lowest and highest 25 percent of values of g w s to give a normalised mean that suits for this type of images containing non-uniform gaps. The IQM was calculated using the Equation (3.13). (3.13) Figure 3.42 shows the inter-word gap as orange solid rectangular box and intra-word gap as red hollow rectangular box. These g w s were classified as inter-word gap and intra-word gap by comparing them with the threshold value T iqm. Inter-word gaps are word boundaries. Input Image with DBC < DBW Output Image After Segmentation Figure 3.42 Word Segmentation with IQM as Threshold Value c. In case of overlapping, denser and skewed documents, IQM failed to spot some valid segmentation points, since the dominance of ascender and descender was more (Figure 3.43).

It was calculated using the Equation (3.14) and the segmentation was restricted to core region (Figure 3.44) as explained below. (3.14) i.

41 79 Input Image with Hidden Valid Segmentation Point Output Image Using IQM as Threshold Figure 3.43 Incorrect Word Segmentation with IQM as Threshold Value To overcome this drawback, TM of g w s was used as threshold value, since it reduced the effect of outliers. It was calculated using the Equation (3.14) and the segmentation was restricted to core region (Figure 3.44) as explained below. (3.14) i. For detecting the core region exactly, the ascender and descender were neglected as outliers. That is, the peaks below the threshold, TM were eliminated, resulting the reduced region of interest to core region. Core region was plotted in the constructed profile, as the area in which the pixel distribution is greater than the threshold value. ii. If the gap inside the core region is greater than TM, it is inter-word gap else it is intraword gap. This might result in negligible loss of foreground pixels. But it would have less impact in the recognition phase, since the pixels only in the ascender or descender were vanished (Gomathi alias Rohini et al 2013). Input Image Output Image After Segmentation Figure 3.44 Word Segmentation with TM as Threshold Value

42 80 Due to variation in spacing between adjacent words, two kinds of errors might occur. They are under-segmentation error, caused by merging adjacent words and over-segmentation error, caused by splitting a single word into two or more words. In some cases both might occur together. The under-segmentation error could be rectified by executing this subroutine recursively and over-segmentation could be rectified by validating the segmentation points as above. Figure 3.45 shows the step-by-step word segmentation process. Step Output Image Input Image Core Detection Core Image Distance Computation Output Image Figure 3.45 Stepwise Word Segmentation Process Thus a line image was segmented into words. The segmented words were stored in a folder as individual JPEG files.

43 81 The word segmentation module split the difficult cases of vertically connected characters incorrectly (shown at red lines in Figure 3.46). Figure 3.46 Incorrect Word Segmentation in Vertically Connected Characters research. This might be due to various reasons, which can be analysed only through further Character Segmentation Module In character segmentation module, the image obtained from word segmentation module was decomposed into sub images of individual symbol or character. Classical or dissection method of character segmentation was employed in the module. This module had the following three sub-modules: 1. Transition Feature Extraction 2. Ligature Detection and 3. Ligature Classification

44 82 (Figure 3.47). The character segmentation procedure followed in this module is explained below Figure 3.47 Flowchart for Character Segmentation

83 3.3.4.1 Extraction of Transition Feature 1. The word image obtained from the previous module was read as a 2D array.

45 Extraction of Transition Feature 1. The word image obtained from the previous module was read as a 2D array. Transitions from 0 to 1 or 1 to 0 vertically, (indicated by green box in Figure 3.48) were extracted. Figure 3.48 Binary Image with Transition Features Marked 2. By vertical scan over this 2D array or matrix, the number of transitions (NT) from 0 to 1 and 1 to 0 was recorded and stored in an array. 3. The columns with NT less than 2 were marked as Possible Segmentation Points (PSP). If NT is greater than 2, it indicates the presence of a loop or semi-loop and PSP could be avoided in that area. 4. NT = 2 denotes the characters like k, h, j and l. To prevent dissection of stem of these characters, stroke height (SH) was added as a constraint. SH estimation is a two scan procedure. a. First scan on each column of the binary image constructed SH profile by calculating the black pixel runs in vertical direction. Then, mean height estimated over the entire column was taken as upper bound (maximum height) for the run-length of the strokes. b. Second scan on the SH profile discarded those strokes whose run-length is greater than the maximum height. 5. SH of the input word image was estimated as the average height of the strokes in the second scan. PSPs were generated based on transition feature along with stroke height (SH)

84 analysis. Then, all PSPs were validated using ligature classification. The extraction of transition feature was not affected by the stroke thickness. 6.

46 84 analysis. Then, all PSPs were validated using ligature classification. The extraction of transition feature was not affected by the stroke thickness. 6. By validating the PSPs with another constraint, that is, the number of pixels in the column less than average SH, some of the PSPs were removed to reduce the complexity in segmentation. Numerous incorrect segmentation points were dropped accurately and correct segmentation points were left as validated segmentation points. 7. The columns that did not satisfy the above two constraints were converted to 1s, shown as black lines in Figure The black line in the image indicates PSP and the consecutive lines were grouped as segmentation blocks (SB). These blocks indicated the ligature (connecter), stored in SB array. Figure 3.49 Segmentation Blocks and Inter Block Space Ligature Detection and Classification Detection and classification of ligature into inter-letter link or intra-letter link lead to generate PSPs. A certain pattern occurs when human beings write words (Gomathi alias Rohini et al 2012). For example, the movement of hand is slower while writing a character and faster while writing ligature between characters. The length of ligature between characters differs from that of ligature within characters. These phenomena of ligature motivated to apply it in character segmentation procedure. 1. To classify the ligatures as inter-letter link (link between characters) and intra-letter link (link within characters), inter block space (IBS) was calculated (Figure 3.48). The IBS is

47 85 the distance between consecutive blocks representing foreground pixels. These distances were stored in an array called IBS array. 2. If a distance in the IBS array is less than one third of the height of the image, it implied that a character is unlikely to fit in less than this width. So the immediate blocks were removed and the IBS array was altered accordingly. Thus, the IBS array holds only the valid character widths. Those columns were extracted from IBS array and passed as parameters to segment the characters. 3. To avoid over-segmentation of w and m in which the width of the character is greater than or equal to 3/2 of height of the image, ligature classification was done using the inverse of IBS array. Threshold value to classify the ligature was calculated as AM of all the elements in the SB array, since they are normal sized. If the block width is less than the threshold value, then it is inter-letter link otherwise intra-letter link. Ligature classification showed promising accuracy in normal writing style. The developed module showed a novelty by preventing over-segmentation of character like u, v, n and k. This approach worked well for normal characters, without any change in the region of interest (Figure 3.50). Figure 3.50 Character Segmentation - Normal Cases

86 Broken characters are characters in which there might be absence of black pixels throughout the column (Figure 3.51). In case of handwritten documents, chances are very less to occur.

48 86 Broken characters are characters in which there might be absence of black pixels throughout the column (Figure 3.51). In case of handwritten documents, chances are very less to occur. The developed module segmented this type of characters also without any error. Figure 3.51 Broken Characters There are cases which have dominance of ascender and descender. This might miss some valid segmentation points. For these cases, certain processing has been done prior to ligature detection. Hence the region of interest was restricted to core region, followed by ligature detection and classification to capture the hidden valid segmentation points Shadow Character Segmentation Shadow characters are characters, over which the extension of ascender or descender might share the columns of the other character. Supremacy of ascender and descender in the words might hide the valid segmentation points. To segment the shadow characters and locate the hidden valid segmentation points, the core region was checked. It was plotted in the constructed horizontal projection profile of the word image, as the area in which the pixel distribution is greater than the threshold value, TM of pixel distribution, since it did not allow extreme values. Figure 3.52 shows a sample step-by-step shadow character segmentation process. It shows the dominance of descender from the letter z hiding the valid segmentation point between letter a and z.

49 87 Figure 3.52 Shadow Character Segmentation Characters with Touching Boundaries Some characters would have touching boundaries, where the width of a character would be greater than the average character width. They could not be segmented directly using ligature classification. Hence, the character boundary was defined using a modified vertical projection profile. It was constructed by calculating the distance between the top and bottom foreground pixels for each column of the image. Then three fourth of average character width was subtracted from both the ends of the image and the low density region in this area was denoted as character boundary. Figure 3.53 shows a sample step-by-step touching boundaries character segmentation process.

88 Input Image Ligature Detection Ligature

53 Touching Boundaries Character Segmentation Thus a

50 88 Input Image Ligature Detection Ligature Classification Vertical Projection Profile of Dilation on the Touching Characters Vertical Projection Profile of Output Image Figure 3.53 Touching Boundaries Character Segmentation Thus a word was segmented into characters. The segmented characters were stored in a folder as individual JPEG files.

89 The character segmentation module produced erroneous outputs, when the touching area of the character was large as well as from multi touching components

54 Larger Touching Areas and Multi Touching Components The character segmentation module failed to locate the valid segmentation points in words with extreme

51 89 The character segmentation module produced erroneous outputs, when the touching area of the character was large as well as from multi touching components (Figure 3.54). Figure 3.54 Larger Touching Areas and Multi Touching Components The character segmentation module failed to locate the valid segmentation points in words with extreme supremacy of ascender and descender in the core region (Figure 3.54). Figure 3.55 Extreme Supremacy of Ascender and Descender research. This might be due to various reasons, which can be analysed only through further The Figure 3.56 portrays the comprehensive segmentation system in a nutshell.

CITS 4402 Computer Vision

CITS 4402 Computer Vision A/Prof Ajmal Mian Adj/A/Prof Mehdi Ravanbakhsh, CEO at Mapizy (www.mapizy.com) and InFarm (www.infarm.io) Lecture 02 Binary Image Analysis Objectives Revision of image formation