Document Image Restoration Using Binary Morphological Filters Jisheng Liang, Robert M. Haralick University of Washington, Department of Electrical Engineering Seattle, Washington 98195 Ihsin T. Phillips Seattle University, Department of Computer Science Seattle, Washington 98122 ABSTRACT This paper discusses a method for binary morphological lter design to restore document images degraded by subtractive or additive noise, given a constraint on the size of lters. With a lter size restriction (for example 3 3), each pixel in output image depends only on its ( 3 3 ) neighborhood of input image. Therefore, we can construct a look-up table between input and output. Each output image pixel is determined by this table. So the lter design becomes the search for the optimal look-up table. By considering the degradation condition of the input image, we provide a methodology for knowledge based look-up table design, to achieve computational tractability. The methodology can be applied iteratively so that the nal output image is the input image after being transformed through successive 3 3 operations. An experimental protocol is developed for restoring degraded document images, and improving the corresponding recognition accuracy rates of an OCR algorithm. We present results for a set of real images which are manually ground-truthed. The performance of each lter is evaluated by the OCR accuracy. Keyword: Document image restoration, morphology ltering, OCR, look-up table. 1 INTRODUCTION During the document image generation processes, such as scanning and digitization, the images are usually corrupted by subtractive as well as additive noise. We wish to design a lter to restore a class of document images with similar structural features and degradation conditions using morphological operations. Earlier work on image restoration can be categorically divided into two groups. The linear lters are based on the assumption of linear, space invariant degradations. The restoration techniques can be carried out in the frequency domain by means of a 2-D FFT algorithm. The linear lter is easy to design and analyze. The classical Wiener ltering provides a mathematically simple and computationally ecient optimization process. But document images often have sharp edges, binary structures and a limited number of bits per pixel. The restriction
that the estimation rule be a linear combination of observed values is typically unsuitable. 2 Mathematical formalisms for practical design of nonlinear optimal lters are not generally available. Optimal nonlinear lters are often found by computationally intensive search procedures. To mitigate the search problem, Loce 2 has employed various optimization constraints, such as a window constraint and library constraint. In this paper we discuss a method for binary morphological lter design to restore document images degraded by subtractive or additive noise, given the constraint on the size of lters. With the lter size restriction (for example 3 3), each pixel in output image depends only on its ( 3 3 ) neighborhood of input image. Therefore, we can construct a look-up table between input and output. Each output image pixel is determined by this table. So the lter design becomes the search for the optimal look-up table. By considering the degradation condition of input image, we provide a methodology for knowledge based look-up table design, to achieve computational tractability. The earlier restoration techniques are oriented toward modeling the degradation and applying the inverse process in order to recover the original image. This approach usually involves formulating a criterion of goodness that will yield some optimal estimate of the desired result. 1 The main problem is the validation of degradation model. 2 PROBLEM STATEMENT Let I denote the input real image, and R denote output restored image. Any morphological ltering can be expressed as a mapping from I to R: R = F (I): Our objective is to search for an optimal lter F to restore the degraded input image and improve the corresponding recognition accuracy rate of OCR algorithm. The lter is constrained to a 3 3 size. 3 FILTER DESIGN ALGORITHM With the 3 3 size restriction, each pixel k in output image depends only on its 3 3 neighborhood of input image, see Figure 1: 3 x 3 1 or 0 Input Image I Output Image R Figure 1: Mapping between input and output images k = f(k1; k2; k3; k; k5; k6; k7; k8; k9): k1; ; k9 are elements in 3 3 neighborhood, shown in Table 1. We call each possible 3 3 neighborhood as a pattern. Each input pattern is associated with a binary 1 or 0
k1 k2 k3 k k5 k6 k7 k8 k9 Table 1: 3 3 pattern on the output. Therefore, we can construct a look-up table between input and output, see Table 2. k1; k2; k3; k; k5; k6; k7; k8; k9 k 000000000 0; 1 100000000 0; 1 0; 1 0; 1 111111111 0; 1 Table 2: Illustrates the look-up table between input and output Each output image pixel is determined by this look-up table. So, the lter design becomes the search for an optimal look-up table. However, there are 2 9 = 512 dierent patterns and 2 512 possible tables. Of course, searching through all the possible combinations is computationally burdensome. 3.1 Generation of Knowledge-Based Table We have developed a methodology to generate a knowledge-based table, by considering the degradation types of input image. We partition the 512 possible input patterns into L dierent subsets, by considering the numbers of -connected components in each pattern. -connected means only the north, south, east, and west neighbors of a pixel are considered connected to it. All patterns in a subset should produce the same output. If we have some knowledge of the degradation process and the nature of the ideal input image, we can determine m subsets which must produce binary-1 outputs, and determine n subsets which must produce binary-0 outputs, by considering contribution of each subset to restoration. Therefore, there are L? m? n remaining subsets which need to be determined whether it is better for the output to be a binary-1 or binary-0. We will search through all the possible combinations and choose the optimal one. The size of search then decreases from 2 L to 2 L?m?n. 3.1.1 Table Generation for Subtractive Noise We assume that noise process is one that creates holes in the image. So whether a 3 3 pattern produces output 1 or 0 depends on its contribution to hole lling. For subtractive noise, all 3 3 patterns whose central pixel is 1 should produce binary-1 outputs. Filters with this property are called foreground preservers. Then, we partition all remaining possible patterns into 50 subsets, by the number of -connected components in each pattern. We determine some subsets which must produce binary-1 outputs, by considering their contribution to the hole lling process. These subsets are called hole llers. All the hole llers are listed in Figure 2 and Figure 3. In order to prevent over lling, some subsets must produce binary-0 outputs to preserve the background. A list of background preservers is shown in Figure. The remaining subsets, which we do not know whether they should produce binary-1 or binary-0, are listed in
Figure 5. Hole Fillers 8 1 Component 1 2 8 8 8 2 Components Figure 2: Subsets of 3 3 patterns which are hole llers. The number below each 3 3 pattern represents the number of rotations of each pattern. 8 8 8 8 2 2 2 Components 8 8 3 Components Components Figure 3: Subsets of 3 3 patterns which are hole llers. The generated knowledge-based table for subtractive noise corrupted images is shown in Table 3. For the remaining subsets, we must check to see whether it is better for the outputs to be a binary-1 or binary-0. We will search through all the possible combinations and choose the optimal one. But the size of search (2 15 ) may be still too large. For subsets which have similar contribution to hole lling, they can be combined to one set. Suppose there are N sets, we search through 2 N combinations. The remaining subsets which are combined to 6 sets are shown in Figure 6. Now, the size of search decreases to 2 6 = 6. In practice, one can employ a dierent partition based on the knowledge of degradation process.
Foreground Preservers Background Preservers 8 8 1 Component 2 Components Figure : Illustrates the subsets of 3 3 patterns which are foreground and background preservers, for subtractive noise corrupted image. Remaining Subsets 8 1 Component 8 8 8 2 8 8 2 Components 3 Components 8 Figure 5: Illustrates the remaining subsets for subtractive noise corrupted image. 3.1.2 Table Generation for Additive Noise With additive noise, all the points of the ideal image belong to the noisy image. We wish to clean the additive points. To do this, we rst determine all the patterns whose central pixel is 0 should produce binary-0 outputs. Filters with this property are called background preservers. Then, we partition all the remaining possible patterns into a number of dierent subsets. Since the noise process is one that creates additive points, we determine some subsets which can be used to remove these points. These subsets must produce binary-0 outputs, see Figure 7. During the noise removing process, in order to preserve the structural shape of ideal character, we determine a list of foreground preservers which must produce binary-1 outputs, see Figure 7. The remaining subsets which are combined to 7 sets are shown in Figure 8. We will search through all the combinations of these 7 sets. The size of search is 2 7 = 128. Again, in practice, one can employ a dierent partition based on the knowledge of degradation process. The generated knowledge-based table for additive noise corrupted image is shown in Table.
Set 1 Set 2 Set 3 Set Set 5 Set 6 Figure 6: Illustrates a list of remaining subsets which are combined to 6 sets. Subsets Output pixel Central pixel is 1 1 Hole llers 1 Background preservers 0 Remaining subsets 0 or 1 Table 3: Look-up table for subtractive noise corrupted images EXPERIMENTAL PROTOCOL We select some real document images as the input images. They have dierent degradation types and degradation conditions. The zone-based ground truth les for input images are generated, following the protocol used in UW English Document Image Database I. We draw the bounding box for each text zone and extract the box coordinates. We now have a list of identiers for the zones and the corresponding bounding box coordinates. The generated ground truth le can be considered as the ideal stage for restoring input images and testing. Subsets Output pixel Central pixel is 0 0 Noise removers 0 Foreground preservers 1 Remaining subsets 0 or 1 Table : Look-up table for additive noise corrupted images
Foreground Preservers 2 Additive Noise Removers 1 8 Figure 7: Illustrates the foreground preservers and noise removers for additive noise corrupted image. The following are the procedures for lter design: Find the degradation type of input image. Generate knowledge-based look-up tables, using the algorithm discussed in Section 3. For each possible table, there is one output image. Run the OCR algorithm on each of output images. Evaluate each look-up table by comparing OCR outputs with ground truth of input image, using OPE (OCR Performance Evaluation) software. A new lter can be designed to work on the output of the previous ltered image. The experiment ow chart is shown in Figure 9..1 Testing We use Caereocr OCR Version 109a, from Caere Systems, and specify the coordinates of the zones to be recognized. Script les to run the OCR algorithm on each of the zones in image are generated and executed. OPE Software 6 is provided by UWEDID-I. It compares the OCR outputs and the corresponding groundtruth information, generating symbol statistics, such as the number of matches, changes, insertions, and deletions, as well as line statistics. In addition, it optionally generates a contingency table which tells how each character is interpreted by the OCR algorithm. We use symbol match accuracy as the primary index in lter testing. Let G(I) and O(R) denote the ground truth of input image and output of OCR algorithm over restored image respectively. Let M(G(I); O(R)) represent the symbol match accuracy between G(I) and O(R). Therefore, our testing criterion is: maxfm(g(i); O(R)g:
8 Set 1 Set 2 Set 3 2 1 Set 8 8 8 Set 5 8 Set 6 Set 7 8 8 Figure 8: Illustrates the remaining subsets for additive noise corrupted image. They are combined to 7 sets. 5 EXPERIMENTAL RESULTS We choose 15 real images with dierent degradation types and conditions to verify our algorithm. The degradation type for each input image has been dened. When either subtractive or additive noise alone is present, the appropriate lters can be used respectively. A new lter can be designed to work on the output of the previous ltered image. For example, four levels of recursive ltering are performed on image 1 in our experiment. If the input has mixed noise, alternating sequential lters can be used. For example, iterated additive noise lter and subtractive noise lter are performed on image 6. During each ltering level, we search through all the possible look-up tables ( 6 for subtractive noise, and 128 for additive noise, see Section 3 ). For each table, there is one output image and corresponding symbol match accuracy of OCR algorithm. The optimal look-up table is determined from 6 (7) input subsets for subtractive (additive) noise. For each input image, the output pixel value (binary 1 or 0) of each input subset is listed in a table. The following are the experimental results for the 15 input images with dierent dominant noise types. 1. Image 1 subtractive noise. 2. Image 2 subtractive noise. 1 65.98 1,,6 2,3 71.39 2 1,3,,6 2 77.06 3 2,,6 1,3 78.09 2,,6 1,3 78.61
INPUT IMAGE Restored FILTER Image GROUND TRUTH OCR Text OPE Text Symbol Match Accuracy Figure 9: Experiment ow chart 3. Image 3 additive noise.. Image additive noise. 5. Image 5 additive noise. 6. Image 6 mixed noise. 1 90.38 1,2,,6 3,5 93.77 1 39.7 6 1,2,3,,5,7 57.89 1 93.28 1,,5,7 2,3,6 96.97 2 2,5 1,3,,6,7 97.05 1 51.78 2,,6 1,3,5,7 68.76 Two levels of recursive ltering are performed on image 6 in our experiment. The rst one is used to remove the additive noise. The second one is used to clean the subtractive noise. This method is comparable to morphological opening and closing ltering. The optimal lters are listed in following table. 1 59.87 2,3,7 1,,5,6 71.71 2 1,3,5 2,,6 73.6
7. Image 7 subtractive noise. 8. Image 8 additive noise. 9. Image 9 additive noise. 10. Image 10 subtractive noise. 11. Image 11 subtractive noise. 12. Image 12 subtractive noise. 13. Image 13 mixed noise. 1 7.03 2,3,,5,6 1 8.76 2 1,2,,5,6 3 85.8 1 87.85 2,,7 1,3,5,6 91.81 1 67.3 3,,5,6,7 1,2 68.51 1 55.79 2,3,,5 1,6 6.19 2,5,6 1,2,3 66.67 1 73.86 1,2,3,5,6 82.63 1 88.20 2,3,,5 1,6 95.28 Two levels of recursive ltering are performed on image 13 in our experiment. The rst one is used to remove the additive noise. The second one is used to clean the subtractive noise. 1. Image 1 additive noise. 1 87.61 1,2,3,,5,6 7 90.19 2 2,3,,5 1,6 92.77 1 63.03 1,2,,6 3,5,7 70.77
15. Image 15 additive noise. 1 53.12 1,2,3,5,6,7 66.36 2 1,3,5,6,7 66.2 We have worked on the individual images, and found the optimal image specic lter which produces the highest OCR accuracy. Of course, in real applications, it is not realistic to search for the best lter for each image. We want to nd the optimal lter for a set of images with similar properties. Therefore, we divided the images into two sets, one with subtractive noise, another with additive noise. The OCR accuracy from 6 3 3 lters for subtractive noise and 128 3 3 lters for additive noise were computed. The average OCR accuracy ( only for the rst ltering level ) were computed for the additive and subtractive cases. The lter which produces the highest average accuracy for each set of images was chosen as the global lter (Table 5). For each image, the accuracy from the global lter, is given in the Table 6 and Table 7. the accuracy from the image specic lter, and the accuracy from the global lter, are given in the Table 6 and Table 7. Image Type Sets with Sets with output 1 output 0 Subtractive Noise 1,2,3,,6 5 Additive Noise 1 2,3,,5,6,7 Table 5: Global Filters for Subtractive Noise Images and Additive Noise Images Image Number 2 7 10 11 12 After ltering 93.77 83.8 63.36 81.30 9.9 Table 6: OCR Accuracy for Subtractive Noise Image Image Number 3 5 6 8 9 13 1 15 After Filtering 57.11 96.07 67.92 70.61 91.2 68.51 89.67 70.3 65.90 Table 7: OCR Accuracy for Additive Noise Image Let x i be the OCR accuracy obtained after employing the best image specic lter on image i. Let ^x i be the OCR accuracy obtained after employing the best global lter on image i. The mean accuracy loss (MAL) can be calculated as: M AL = 1 n P n i=1 (x i? ^x i ) The mean accuracy loss for employing the global subtractive noise lter rather than the best image specic lter was 0:76%. The mean accuracy loss for employing the global additive noise lter rather than the best image specic lter was 0:62%. This shows that the global lter works well for each individual image. 6 DISCUSSION An algorithm for binary morphological lter design has been given to restore document images degraded by subtractive or additive noise, given design constraints on the size of lters. We provided a methodology for
knowledge based look-up table design, to achieve computational tractability. An experimental protocol has been developed for restoring degraded document image, and improving the corresponding recognition accuracy rates of OCR algorithm. Our future work will include the investigation of the eect on ideal characters when each kind of pattern is applied and the eect of constraint on lter properties. Also, we try to develop a classier to classify the input images into dierent subsets and decide the appropriate lter to each subset. 7 REFERENCES [1] R.C. Gonzalez and R.E. Woods, Digital Imaging Processing, Addison Wesley, 1993. [2] R.P. Loce, \Morphological Filter Mean-Absolute-Error Representation Theorems and Their Application to Optimal Morphological Filter Design" Ph.D Thesis, Rochester Inst. of Tech, 199. [3] I.R. Joughin, R.M. Haralick, and E.R. Dougherty, \Model-based Algorithm for Designing Suboptimal Morphological Filters for Restoring Subtractive-noise-corrupted Images", Journal of Electronic Imaging, vol.2, no., pp. 31-25, Oct. 1993. [] S. Chen, S. Subramaniam, I.T. Phillips, and R.M. Haralick, \Performance Evaluation of Two OCR Systems", Third Annual Symposium on Document Analysis and Information Retrieval, pp. 299-317, April 199. [5] \Reference Manual for UW English Document Image database", ISL report,e.e. Dept.,U. of Washington. [6] Su Chen, \OCR Performance Evaluation Software User's Manual", ISL Report,E.E. Dept.,U. of Washington.