Bi-level Image Watermarking and Distortion Measure

Size: px

Start display at page:

Download "Bi-level Image Watermarking and Distortion Measure"

Amice Jackson
5 years ago
Views:

1 Bi-level Image Watermarking and Distortion Measure Lu Haiping School of Electrical & Electronic Engineering A thesis submitted to the Nanyang Technological University in fulfillment of the requirement for the degree of Master of Engineering 2003

2 Statement of Originality I hereby certify that the work embodied in this thesis is the result of original research done by me and has not been submitted for a higher degree to any other University or Institute Date Lu Haiping

3 To my beloved xiaoyi. To my parents and sisters.

4 Acknowledgements I would like to express my sincere gratitude to my supervisor, Prof. Kot Chichung, Alex, for his guidance, nurturing, encouragement and support in every stage of my graduate study. His knowledge, kindness, patience, and vision have provided me with lifetime benefits. I am very grateful to Prof. Shi Yun-Qing for guiding me into the field of digital watermarking and image processing, as well as for his continuous encouragement and help in my study and research. I want to take this opportunity to thank Prof. Chen Lihui for her valuable comments and suggestions on my research work. I would also like to thank her graduate student Shi Xuxia for providing details on her work. Special thanks to Wang Jian for his help in my research. Also thank my fellow graduate students Yang Huijuan, Zeng Jiancheng, and Cheng Jun for their help in my graduate study. Finally, I want to thank my parents and sisters, and my dear Xiaoyi for their love and support, without which I could never accomplish all these. Thanks to all the people who have ever helped and encouraged me. i

5 Summary In the past several years, there has been a rapid growth in the use of digital medias. Not only gray-level/color images, video and audio are in digital form, but bi-level images are also digitized in the applications involving electronic documents such as digitized handwritten signature, legal and financial documents, digital books, maps, and architectural and electronic drawings. This explosive digital world has generated new challenges while offering many advantages. One big concern is about the protection of intellectual property rights. Digital watermarking techniques address this concern by embedding digital information about the origin, status, and/or destination within the digital media content. This thesis discusses the issues regarding an area in this field that is less well addressed in the literature: bi-level image watermarking. We focus on binary document images in particular. We suggest a new objective distortion measure for the evaluation of visual distortion in binary document images. Subjective testing has shown that this measure has better correlation with human visual perception of binary document images than the commonly used peak signal-to-noise ratio (PSNR). Such an objective measure is very helpful in the design and evaluation/comparison of binary image watermarking algorithms. A fragile watermarking algorithm is proposed based on this distortion measure and its tamper-proofing function has been demonstrated through simulations. ii

6 The validation of a watermarking approach operating on the DC components of Discrete Cosine Transform (DCT) is derived. Based on the insights from this derivation, a simplified watermarking algorithm is developed and experiments show that it has some robustness against cropping and additive noise. We also study the feasibility of embedding watermarks in the AC components of DCT, a common approach for gray-level/color images while seemly hardly feasible for binary images. We propose a watermarking scheme using this approach. Our results show that it can embed watermark successfully with some robustness. iii

7 Contents Acknowledgements Summary List of Figures List of Tables i ii vii x 1 Introduction Digital Watermarking of Multimedia Content Motivation Objectives Thesis Organization and Contributions Preliminaries Categorization of Binary Images Classification of Digital Watermarking Techniques Previous Methods on Binary Image Watermarking The Proposed Algorithms and Assumptions Distance-Reciprocal Distortion Measure HVS Models Prior Art on Distortion Measures Traditional Objective Distortion Measures iv

8 3.2.2 Other Objective Measures Subjective Testing Methods Document Images in Human Eyes The Distance-Reciprocal Distortion Measure Experimental Results Chapter Summary Fragile Watermarking Based on DRDM Introduction Effects of Varying Weight Matrix Size Fragile Watermarking Based on DRDM D Shifting Watermark Embedding Embedding Capacity Watermark Extraction Experimental Results Chapter Summary Watermarking through Biased Binarization Watermark Embedding in DC Components of DCT The Feasibility of DC Components Embedding Successful Watermark Embedding in DC Components Experimental Results Watermarking through Biased Binarization without Using DCT Blurring Watermarking through Direct Biased Binarization Watermark Extraction Experimental Results Chapter Summary v

9 6 Watermarking in Frequency Domain Difficulties in Frequency-Domain Approach Important Considerations on the Frequency-Domain Approach Block Size in Processing Observation on Non-Embeddable Blocks The Embedding Strength Exhaustive Study on Blocks of A Simple Watermarking Algorithm in the DCT Domain Watermark Embedding Watermark Extraction The Exhaustive Study Results Enhancement of the AC Embedding Algorithm AC Coefficient Modification Biased Binarization according to Spatial and Frequency Domain Characteristics Performance Improvement in Exhaustive Study Results Experimental Results Chapter Summary Conclusion and Recommendations Summary and Conclusion Recommendations for Future Work Extension to Halftone Images Further Development of the Distortion Measure Further Development of the Watermarking Algorithms Invertible Watermarking Author s Publications 102 Bibliography 104 vi

10 List of Figures 1.1 Generic digital watermark embedding scheme Generic digital watermark detection scheme CCITT test images 1 to CCITT test images 5 to Classification of digital watermarking techniques Test image cut from CCITT4 for watermarking experiments Illustration of visual distortion in a document image Original binary document image for subjective testing of visual distortion Distortion generation for subjective testing One set of test images for subjective testing Distribution of subjective ranking scores Distribution of pixels (assuming flipped) with DRD < 0.5 for the non-uniform 8 8 blocks of the image shown in Fig. 4.3(a) The proposed fragile watermarking algorithm for binary document image based on DRDM Original and watermarked binary document images for the proposed fragile watermarking method based on DRDM Flipped pixels for the center portion of Fig. 4.3(b) Tamper proofing of the tampered image vii

11 5.1 DC components embedding for binary images by X. Shi (2001) The probability of D k W G (k) > T mid k against T mid k /C b k (0, 0) for α = The count of the non-uniform 8 8 blocks in the eight CCITT binary test images over the range of T mid k /C b k (0, 0) Original and watermarked binary images for the method by X. Shi (2001) Detector response for the watermarked image of Fig. 5.4(b) The proposed watermarking method using biased binarization Loop to control the embedding in the proposed watermarking algorithm using biased binarization Original and watermarked binary images for the proposed watermarking method using biased binarization Pixels flipped in the watermarked image of Fig. 5.8(b) Results of watermarking through biased binarization for cropping Results of watermarking through biased binarization for additive white Gaussian noise degradation AC coefficient magnitudes of DCT for a gray-level image AC coefficient magnitudes of DCT for a binary image The proposed DCT-domain watermarking algorithm for binary images The classification of the AC coefficients of DCT Comparison of V Ratio before and after enhancement Comparison of F Max before and after enhancement Comparison of F Ave before and after enhancement Comparison of DRD Max before and after enhancement Comparison of DRD Ave before and after enhancement Original and watermarked binary images for the proposed DCTdomain watermarking algorithm viii

12 6.11 Flipped pixels in the watermarked image in Fig. 6.10(b) Results of the proposed DCT-domain watermarking for cropping Results of the proposed DCT-domain watermarking for additive white Gaussian noise degradation ix

13 List of Tables 3.1 Weight matrix before normalization (m = 5) Weight matrix after normalization (m = 5) Experimental results of the subjective testing for the proposed distortion measure Exhaustive study results for the proposed simple AC embedding algorithm in DCT-domain Exhaustive study results for the enhanced DCT-domain embedding algorithm x

14 Chapter 1 Introduction There has been a rapid growth in the use of digital medias recently. Digitized medias not only reduce physical storage required but also enable easier backup, searching and retrieval. As multimedia contents are stored in the digital format, it has become possible to make identical copies, modify the content or forge information using powerful software and digital devices available with great ease. Digital watermarking is proposed to protect digital media content and prevent or discourage illicit redistribution and reproduction of them through embedding copyright and authentication information within media content. Compared with the plurality of proposed methods for digital watermarking for picture and video images [1 5], digital watermarking methods for bi-level images are very limited. One important reason for this difference is that bi-level images lack rich gray-scale information that can be easily modified imperceptibly. Furthermore, for fair evaluation of the watermarking algorithms for bi-level images, the popular peak signal-to-noise ratio (PSNR) is not suitable for the measure of visual distortion. An objective measurement of visual distortion in bi-level images that is well correlated with human visual perception is needed. The work included in this thesis provides an objective distortion measure for

15 1.1. Digital Watermarking of Multimedia Content 2 bi-level images and investigates feasible digital watermarking techniques for bilevel images. In this chapter, we give a brief overview of digital watermarking, introduce our motivation and objectives, and outline the organization of this thesis and its major contributions in digital watermarking for the protection of digital documents in a bi-level image format. 1.1 Digital Watermarking of Multimedia Content While digital medias offer many distinct advantages over analog medias, these advantages have also created a pressing need to protect digital information against illegal duplication and manipulation. In digital domain, with the ease of editing, perfect reproduction and massive Internet distribution, the protection of ownership and the prevention of unauthorized tampering of digital media contents become important concerns. Traditional methods for protecting digital medias include encryption techniques and digital time stamping. These security protocols serve to secure the communication channel between two parties by restricting access. However, these methods only protect data transmission and they cannot prevent the receiving party from illicit duplication or manipulation of the cleartext versions of the protected data received [6]. Digital watermarking, the direct embedding (hiding) of subliminal information (the watermark) into digital media contents, has been proposed as an additional tool to protect digital media contents. A digital watermark is information that is embedded within the host data to provide information about the origin, status, and/or destination of the host data [6]. The basic requirements

16 1.1. Digital Watermarking of Multimedia Content 3 Figure 1.1: Generic digital watermark embedding scheme. Figure 1.2: Generic digital watermark detection scheme. of watermarking are imperceptibility, robustness and capacity. They conflict with each other and the design tradeoffs between them are linked closely to the application scenarios. Applications of digital watermarking include transaction tracking (fingerprinting), proof of ownership, device control (copy/access control), authentication (tamper detection), data hiding (annotation and broadcast monitoring), legacy system enhancement and database linking [4, 7]. The generic embedding process described in [1] is shown in Fig Given an image I o as the original image, a watermark W and a key K, the watermark embedding process is defined as a mapping: I o K W I w. Fig. 1.2 illustrates the generic detection process where the output is either the recovered watermark W or some confidence measure indicating how likely the given watermark W is present in the test image I w [1].

17 1.2. Motivation Motivation As the usage of digital medias grows rapidly, not only multi-level images, video, and audio are in digital form, but bi-level images are also digitized in the applications involving electronic documents such as digitized handwritten signature, legal and financial documents, digital books, maps, and architectural and electronic drawings. Most of digital image watermarking techniques in the literature are proposed for gray-scale/color images [1 5], while the digital watermarking methods for bi-level images are quite limited in comparison. The systems working on gray-level images, in which the pixels may take on a wide range of values, are not directly applicable to bi-level images, in which there are only two pixel values and no small gray-level variation. Any modification in a bi-level image is a flipping from one level to the other. Thus, watermark embedding without causing visibly noticeable artifacts becomes more difficult for bi-level images. In the following presentation of this thesis, we will use the term binary images instead of bi-level images. There is no essential difference between the two terms, which are only differentiated by the context that they are in use. Bi-level means that there are only two levels in the image and it is a term from the application point of view. The two levels can be any two color levels (red, green,...) or any two gray levels. Binary means that only one bit is used to represent a pixel in an image and it is a term from the view of digital processing in which one level is treated as 0 and the other is treated as 1. Binary images are more commonly used in digital image processing and by default they refer to black-and-white-only images. If an algorithm works on binary images, it can work on bi-level images as well. Therefore, in this thesis, we are dealing with black and white images only and we use the term binary images rather than

18 1.3. Objectives 5 bi-level images throughout the thesis. 1.3 Objectives The objectives of this work are to develop an objective distortion measure for binary images that has good correlation with human visual perception, to use this measure to help the design and evaluation of binary image watermarking algorithms, to study the feasibility and get insights of the frequency domain approach for binary images, and to advance the overall technology through improved or new algorithm development in the spatial and/or frequency domain of binary images. 1.4 Thesis Organization and Contributions In this chapter, a brief introduction to the thesis is provided including the background of our research work, the motivation, objectives and the main contributions. In the next chapter, we will discuss the categorization of binary images and classification of watermarking techniques, give an overview on the binary image watermarking algorithms in the literature, and describe the assumptions in the presentation of the proposed measure and algorithms. The subsequent chapters of this thesis are organized to highlight the four major contributions as described in the following. The proposal of an objective distortion measure An objective distortion measure is proposed for binary document images in Chapter 3 for evaluating the visual distortion in such images. It is called the distance-reciprocal distortion measure (DRDM) and it is proposed to

19 1.4. Thesis Organization and Contributions 6 replace PSNR in measuring the visual distortion in binary document images. The development of this measure is inspired by the lacking of such an objective measure for binary document images that has close correlation with human visual perception. It is an important tool for performance comparison among different binary image watermarking algorithms and itself can be used to develop such algorithms. The development of a fragile watermarking algorithm Chapter 4 presents a secure fragile watermarking algorithm developed based on the DRDM method for authentication of binary document images. Experiments show that it is able to detect tampering of watermarked binary document images and it is possible to localize tampering in the watermarked image. The validation of a DC component embedding approach and the proposal of a simplified watermarking algorithm In Chapter 5, a DC component embedding approach in the Discrete Cosine Transform (DCT) domain is analyzed and validated. From the insights in the derivation and the understanding of the approach, a new watermarking algorithm is proposed with some robustness against cropping and additive noise. The new algorithm is much simpler than the DC component embedding approach and it provides more flexibility in controlling the quality of watermarked images and the robustness. The proposal of a robust watermarking algorithm using frequency domain approach Watermark embedding in the frequency domain has been shown to be hardly feasible for binary images in one literature [8] and many others

20 1.4. Thesis Organization and Contributions 7 admit that it is very hard. Chapter 6 proposes a feasible approach to embed watermark in binary images through frequency domain modification. Some robustness against cropping and additive noise has been shown. Finally, Chapter 7 concludes with a summary of the contributions and recommends some future research directions worthy to investigate.

21 Chapter 2 Preliminaries 2.1 Categorization of Binary Images We categorize binary images into two groups: binary document images and halftone images. Binary document images refer to binary images that have sharp contrast of black and white and there are clear boundaries between black and white areas in the images. Halftone images are dithered gray level images in which black and white pixels are well interlaced. The binary images that we are dealing with in this thesis is binary document images. A set of representative binary document images are the eight standard CCITT binary images [9, 10], as shown in Figs. 2.1 and 2.2. The distortion measure and watermarking algorithms proposed in this thesis are applicable to all these binary document images.

22 2.1. Categorization of Binary Images 9 (a) CCITT1. (b) CCITT2. (c) CCITT3. (d) CCITT4. Figure 2.1: CCITT test images 1 to 4.

23 2.1. Categorization of Binary Images 10 (a) CCITT5. (b) CCITT6. (c) CCITT7. (d) CCITT8. Figure 2.2: CCITT test images 5 to 8.

24 2.2. Classification of Digital Watermarking Techniques 11 Figure 2.3: Classification of digital watermarking techniques. 2.2 Classification of Digital Watermarking Techniques We adopt the classification scheme in [11] for existing digital watermarking techniques as shown in Fig. 2.3, in which there are two main types of digital watermarks: perceptible and imperceptible. Perceptible watermarking techniques create noticeable changes, such as a visible ownership logo, in the host signal when added, but do not severely impede the host signal from communicating the original message. They have been successfully implemented commercially and they are out of the scope of this thesis. In [11], the class of imperceptible watermarks is divided further into two subclasses: fragile and robust. Fragile watermarks are embedded to the host signal so that almost any unwanted manipulation of the watermarked signal alters the

25 2.3. Previous Methods on Binary Image Watermarking 12 extracted watermark, providing information about the tampering process. In comparison, robust watermarks are embedded in the host such that it is difficult to remove them. Thus, they need to be resilient to unintentional common processing and intentional attacks. Two types of watermarking methods are further defined in [11] within the subclass of robust methods: source extraction and destination extraction. Source extraction techniques, which are often called non-blind watermarking, require that the original host signal be available to extract the watermark and they have higher robustness to signal distortions, at the expense of practicality as the host signal may not be available always. One the other hand, destination extraction schemes, which are often called blind watermarking, require only a key for watermark extraction. As shown in Fig. 2.3, the three watermarking algorithms that we propose in Chapters 4, 5 and 6 of this thesis fall under the categories of fragile, destination extraction and source extraction watermarking, respectively. We discuss these algorithms in details in the corresponding chapters. 2.3 Previous Methods on Binary Image Watermarking Most of digital image watermarking techniques in the literature are proposed for gray-scale or color images, while digital watermarking for binary images are less well addressed. It is agreed by many that this is a challenging problem due to the fact that changes (flippings) of pixels in binary images are more likely to be visually noticeable. Recently, we have seen a growing number of papers addressing this area. M. Chen et al. give an overview on the recent developments in [12], and summarize

26 2.3. Previous Methods on Binary Image Watermarking 13 and classify the available binary image watermarking and data hiding techniques according to the embedding methods as text line, word, or character shifting, boundary modifications, fixed partitioning of the image into blocks, modification of character features, modification of run-length patterns, or modification of halftone images. They also discussed important issues such as robustness and data hiding capacity. They pointed out that those techniques robust to printing, scanning, photocopying and facsimile transmission are suitable for applications when hardcopy distributions of documents are involved, and other techniques offering high embedding capacity with little robustness are useful in applications when documents are distributed in electronic form and no printing, scanning and photocopying of hardcopies are involved. On the other hand, tradeoff between embedding capacity and robustness is observed as expected. In this section, we will give a brief survey of the binary image watermarking algorithms in the literature. They are discussed according to the kinds of binary images applicable. Algorithms for General Binary Images M. Wu et al. [13] proposed a data hiding algorithm for general binary images. They determine whether a pixel can be flipped by examining the pixel and its 8 neighbors to establish a score indicating how noticeable it will be if the pixel is flipped, considering the change in smoothness and connectivity. Pixels with lower scores are suitable candidates for flipping. Data is embedded by flipping the pixel in a block with the lowest score to enforce the total number of black pixels in the block to be odd (bit 1 ) or even (bit 0 ). Shuffling, a random permutation of all pixels in the image, is done after the flippable pixels have been identified based on the scores to handle the uneven embedding capacity due to uneven distribution of flippable pixels.

27 2.3. Previous Methods on Binary Image Watermarking 14 Y. C. Tseng et al. have proposed novel data hiding schemes with large embedding capacity [14 17]. They employ a binary matrix and an integer weight matrix as secret keys to hide a large amount of data by changing a small number of bits (pixels) in the original image to ensure the validity of a defined invariant involving modulation of the result from binary (exclusive OR) operation and pair-wise multiplication of the binary matrix, the original image block, and the weight matrix. J. Zhao and E. Koch [18] embed robust watermarks into binary images by altering the numbers of black and white pixels in selected 8 8 blocks. The percentage of white pixels is in one range (e.g. 50% 60%) if bit 1 is embedded and the percentage of the white pixels is in another range (e.g. 40% 50%) if bit 0 is embedded. The selected block is considered to be invalid when too much flipping is needed. The pixels that have the most neighbors with the opposite pixel value are flipped for sharply contrasted binary (document) images and the pixels that have the most neighbors with the same pixel value are flipped for dithered binary (halftone) images. Algorithms for Text Document Images Altering text formatting of formatted text S. H. Low, N.F. Maxemchuk and other co-authors proposed text document watermarking by altering the text formatting [19 26]. Specifically, they encode watermark information by shifting lines or words by an indiscernible amount. Feature detection, correlation detection and centroid detection are applied to the horizontal and vertical profiles to extract the watermark. In [27], J. Brassil and L. O Gorman use the height of a bounding box

28 2.3. Previous Methods on Binary Image Watermarking 15 enclosing a group of words to embed data. They increase the height of the bounding box by adding a small number of pixels to the endlines of characters with ascenders or descenders or displacing either selected words or characters off the logical text baseline vertically. D. Huang and H. Yan [28] proposed to modify the average interword spaces of different text lines slightly such that the interword space changes form the sampling points of a sine wave, which is the watermark information. They call this space coding, which means that space patterning of text documents are used for watermarking. N. Chotikakamthorn [29] embeds data by adjusting widths of a few consecutive character spaces on the same text line according to a predefined rule. The advantage of this algorithm over word spacing (shifting) is that it can be applied to written languages that do not have spaces with sufficiently large width for word boundaries, such as Chinese, Japanese and Thai. Y. Liu et al. [8] proposed a combined approach to mark a text document by line or word shifting proposed by S. H. Low et al. above and detect the watermark in the frequency domain by the algorithm proposed by I. Cox et al. [30]. The watermark is computed in the frequency domain after the spatial embedding through shifting. Altering character features J. T. Brassil et al. proposed text document watermarking by altering certain text features of characters such as height, relative position, and vertical endlines [21, 25, 26]. Q. G. Mei et al. proposed a method for data hiding in binary text documents by embedding data in the 8-connected boundary of a character

29 2.3. Previous Methods on Binary Image Watermarking 16 in [31]. They have identified a fixed set of pairs (28 pairs in the paper) of five-pixel long boundary patterns for embedding data. Two patterns in each pair are dual of each other and changing (flipping) the pixel value of one pattern at the center position would result in the other. Thus they have two patterns in each pair to code (embed) one bit. T. Amamo and D. Misaki [32] proposed a feature calibration method for watermarking, in which two sets of partitions are symmetrically arranged, and the difference between the average feature values of the two sets is extracted. The feature used in the paper is the average width of the horizontal strokes of characters. Two operations, make fat and make thin, are defined to modify (increase and decrease the width of) the horizontal strokes in target partitions to embed information. Algorithms for Halftone Images M. S. Sun and O. C. Au have published a series of papers on data hiding in halftone images [33]. Some [34 36] can be applied without the original grayscale image and halftoning method, i.e. when only the halftone image is available. They are called Data Hiding by Pair-Toggling (DHPT), Data Hiding by Self- Toggling (DHST), and Data Hiding by Smart Pair-Toggling (DHPT). Data is embedded at pseudo-random locations by forced toggling of a pixel or a pair of white and black pixels, or the location of toggling is selected to minimize the distortion. Other algorithms [33, 37 41] assume that the original grayscale image is available, and integrate data hiding techniques into the error diffusion process (a halftoning method). Some of these algorithms apply DHST and then diffuse the distortion [33], some embed data in the parity domain [37, 39] and others embed a simple image in two related error-diffused halftone images or

30 2.3. Previous Methods on Binary Image Watermarking 17 two parts of a single halftone image utilizing some conjugate properties such that the hidden image can be viewed directly when the two images or two parts of the single image are overlaid [38 40]. Z. Baharav and D. Shaked [42] use two different dithering matrices in the halftoning process to encode information and the different statistical properties due to the different matrices are detected in extraction. H. A. Wang [43] proposed two data hiding methods named modified ordered dithering, which replaces one out of the neighboring 16 pixels in an ordered or pre-programmed way, and modified multiscale error diffusion, which uses the multiscale error diffusion algorithm [44] with fewer floors in the image pyramid and half of the progressive binarization process is used to encode data. K. Tanaka et al. [45] proposed a data hiding scheme using the statistical properties of dithered (halftone) images. The scheme controls dots pattern in ordered dithering process with a sequence of bits of characters, i.e., the data to be embedded. There are also patents using stochastic screen patterns [46] and conjugate halftone screens [47] to embed visual patterns in two halftone images formed. The pattern hidden is revealed when the two images are superimposed. Other Algorithms Algorithm for facsimile images In [48], K. Matsui and K. Tanaka proposed to embed data in facsimile images by manipulating the run-lengths, by shortening or lengthening the run length of black pixels. Algorithm for binary comic images N. Kobori et al. proposed a watermarking scheme for binary comic im-

31 2.4. The Proposed Algorithms and Assumptions 18 ages in [49]. Copyright data is represented by pixel distribution through expanding or shrinking the area of black pixels uniformly. Algorithm for map and text images M. Pierrot-Deseilligny and H. Le-Men described their algorithm to embed small binary patterns (the watermark) into a large binary map or text image in [50]. The watermarked image is obtained through the combination of the original image and its dilatation conditional to an image that is a periodic repetition of the watermark. In watermark recovery, the average of a sufficient number of sub-images of the watermarked image is enhanced to show the watermark image. 2.4 The Proposed Algorithms and Assumptions Among the literatures reviewed, the evaluation of the quality of the watermarked binary images are mostly subjective and there is no suitable objective distortion measure. Therefore, we developed the DRDM method. We also propose a fragile watermarking algorithm for tamper proofing and localization of general binary document images based on our new distortion measure. While many claimed that applying frequency-domain approach to binary images is difficult, we study the feasibility of this approach to get some insights. In the following discussion, we state several assumptions: The scope of applicable binary images The objective distortion measure and binary image watermarking algorithms proposed in this thesis are applicable to binary document images as defined at the beginning of this chapter. We do not consider halftone images in this thesis.

32 2.4. The Proposed Algorithms and Assumptions 19 Figure 2.4: Test image cut from CCITT4 for watermarking experiments. The image used to show the experimental results The proposed algorithms are applicable to various binary document images as shown in Figs. 2.1 and 2.2, and the performance on these images are quite similar. To save space in presentation, we give the results on a typical text document image of size only, as shown in Fig This image is a cut from the CCITT4 image in Fig. 2.1(d) and it is shown in the presentation of experimental results for every algorithm for easy comparison against the quality of the watermarked image. Robustness test For the proposed algorithms that offer some robustness, we are concerned about the digital domain only and did not test the robustness against printing, scanning and photocopying. The research work in this thesis focuses more on the feasibility of the proposed algorithms and some pre-

33 2.4. The Proposed Algorithms and Assumptions 20 liminary study. Therefore, when robustness needs to be tested, we follow the robustness testing in [8] and only test the robustness against cropping and additive white random Gaussian noise degradation. The watermark In the proposed watermarking algorithms, a random bit stream of 0 s and 1 s is used as the watermark in our experiments. Similar results will be obtained if the bit stream carries any meaningful digital information such as a string of characters or numbers, or a binary or even gray-level image. Security concern for applications The focus of this thesis is on watermarking algorithms rather than developing a complete system for a specific application. Therefore, security concerns in applications are not considered in detail. In practice, other security measures, such as encryption/decryption systems [51,52] and biometrics [53], can be integrated with the proposed watermarking algorithms to build systems for practical applications.

34 Chapter 3 Distance-Reciprocal Distortion Measure Image quality after processing is always a major concern in evaluating the performance of image processing applications. Not surprisingly, one of the important requirements in digital image watermarking is imperceptibility. In other words, the distortion resulted from watermarking process should be perceptually invisible. Visual distortion may be present in some other binary document image applications as well. Thus, it is important to measure such distortion for performance comparison or evaluation of these applications. Popular traditional distortion measures widely used in image and video processing are not suitable for binary images. As pointed out in a recent review [12], quantitative methods should be developed to evaluate the quality of watermarked binary document images. On the other hand, while a good objective distortion measure can be developed by studying human visual system (HVS) models, the understanding of HVS features has strong influence on the design of a watermarking algorithm [5, 6]. In this chapter, after a brief review of the HVS models and distortion met-

35 3.1. HVS Models 22 rics in the literature, we propose a novel objective distortion measure for binary document images that is based on human visual perception. The distance between pixels is found to play an important role in human perception of visual distortion in these images. Hence, the reciprocal of distance is used to measure visual distortion in digital binary document images and it is straightforward to calculate. Subjective testing results show that the proposed objective distortion measure matches well to subjective evaluation by human visual perception. 3.1 HVS Models HVS models are very important in image processing applications such as compression and watermarking [5, 6]. A typical HVS model consisting of color processing, decomposition, contrast and adaption, contrast sensitivity and masking [54] are described briefly below. The color processing stage concerns the transformation into an adequate perceptual color space, usually based on opponent colors. After this stage, the image is represented by one achromatic and two chromatic channels carrying color difference information. In the second stage, decomposition, it is widely accepted that the HVS bases its perception on multiple channels that are tuned to different ranges of spatial frequencies and orientations. Measurements of the receptive fields of simple cells in the primary visual cortex revealed that these channels exhibit approximately a dyadic structure. It is believed that there are also a number of channels processing different object velocities or temporal frequencies. The third stage is contrast and adaption. The response of the HVS depends much less on the absolute luminance than on the relation of its local variations to the surrounding background [54]. Contrast is a measure of this

36 3.2. Prior Art on Distortion Measures 23 relative variation. While it is quite simple to define a contrast measure for elementary patterns, it is very difficult to model human contrast perception in complex images, because it varies with the local image content. Furthermore, the adaptation to a specific luminance level or color can influence the perceived contrast. The human contrast sensitivity depends on spatial frequency and temporal frequency of the stimuli. The contrast sensitivity function (CSF) is to model this phenomenon. The correct modelling of the CSF is especially difficult for color images. Typically, separability between color and pattern sensitivity is assumed, so that a separate CSF for each channel of the color space needs to be determined and implemented. Masking occurs when a stimulus that is visible by itself cannot be detected due to the presence of another. Sometimes the opposite effect, facilitation, occurs: a stimulus that is not visible by itself can be detected due to the presence of another. It is helpful to think of the distortion being masked by the original image acting as background. Masking explains why similar distortions are disturbing in certain regions of an image while they are hardly noticeable elsewhere. There are several types of spatial masking: contrast masking due to strong local contrast, edge masking at edges, and texture masking due to local activity. The most important HVS features for the designer of a watermarking algorithm are the contrast sensitivity and masking capability [6]. 3.2 Prior Art on Distortion Measures There are two ways to measure visual distortions, as discussed in [55]. One is subjective measure and the other is objective measure. Subjective measure

37 3.2. Prior Art on Distortion Measures 24 is important since human is the ultimate viewer. However, it is very costly to conduct subjective testing. On the other hand, objective measurement is repeatable and easier to implement, while such a measure does not always agree with the subjective one Traditional Objective Distortion Measures The mean square error (MSE), signal-to-noise ratio (SNR), and peak signalto-noise ratio (PSNR) [55] are popular distortion measures for gray-level/color images. For an image processing system with f(x, y) as the input image and g(x, y) as the processed output image, the distortion d(x, y) is obtained from the difference between the input and output images: d(x, y) = g(x, y) f(x, y) (3.1) Hence, MSE = 1 MN M 1 x=0 N 1 y=0 d(x, y) 2 (3.2) where M and N are the dimensions of the image. The corresponding SNR and PSNR are defined as [56]: M 1 x=0 SNR(dB) = 10 log 10 P SNR(dB) = 10 log 10 M 1 x=0 M 1 x=0 N 1 y=0 f(x, y)2 N 1 y=0 d(x, (3.3) y)2 P 2 MN N 1 y=0 d(x, (3.4) y)2 where P is the maximum peak-to-peak signal swing. E.g. P is 255 for 8-bit images. It can be seen that SNR and PSNR are all MSE based and these three measures are essentially equivalent. For binary document images, these traditional distortion measures are not

38 3.2. Prior Art on Distortion Measures 25 (a) Original document image. (b) Distorted images. Figure 3.1: Illustration of visual distortion in a document image. well matched with subjective assessment since they are point-based measurement and mutual relations between pixels are not taken into account. For instance, a simple document image is shown in Fig. 3.1(a), and four differently distorted images are shown in Fig. 3.1(b). Each distorted image has four pixels having opposite binary values compared with their counterpart in Fig. 3.1(a). According to (3.2), (3.3) and (3.4), these four distorted images have the same MSE, SNR, and PSNR, but then, the distortion perceived by our human visual system is quite different. These traditional distortion measures have only an approximate relationship with the visual distortion perceived because they are based on simple pixel-bypixel comparison of images. However, visual information perceived is largely conveyed by the relative activities of neighboring image points [57], especially in binary images Other Objective Measures Many authors have discussed the gap between subjective and traditional objective distortion measures and they proposed solutions for objective distortion or quality measures for video or multi-level images [56, 58 64], mostly based on HVS models or commonly occurred distortion types. These measures are designed for images with rich scales and they are not often applicable to binary document images. S. Matsumoto and B. Liu [57] proposed simple analytic fidelity measures

39 3.2. Prior Art on Distortion Measures 26 based on the concept of contrast preservation for the analysis of halftoning techniques. Thus, the proposed measures are used to measure the distortion in halftone images produced. H. S. Baird developed a document image defect model [65] for the image defects that occur during printing and scanning. This model has been used to construct perfect metrics for minimum-distance classification of character images in optical character recognition (OCR) applications [66]. The model and metrics are for the measurement of classifier performance, characterization of document image quality and construction of high-performance classifiers. In [67], A. J. Baddeley proposed an error metric for binary images. The metric is used to measure the performance of edge detection and localization performance in computer vision applications and segmentation in remote sensing. Thus, it is also for the estimation of classification errors rather than the measurement of visual distortion. M. Wu et al. in [13] score the visual distortion due to data hiding in binary images through measuring the change in smoothness and connectivity caused by flipping a pixel. However, it is a rule-based approach and the analysis involved is extensive for a larger neighborhood Subjective Testing Methods HVS models are built based on psychophysical experiments, involving observers to make subjective decisions. Subjective tests are the only true benchmark for evaluating the performance of perception-based image processing methods [54]. However, perceptual responses cannot be represented by an exact figure and can only be described statistically. A. A. Webster et al. [64] introduced an objective measurement of video

40 3.2. Prior Art on Distortion Measures 27 quality based on human visual perception. The testing procedures involved are helpful in the evaluation of a new distortion measure. The original video, taken from a library of test scenes is passed to an impairment generator to get a degraded video. Both the original video and the degraded video are then passed to an objective testing, which gives objective test results, and a subjective testing, which gets the assessment from a viewing panel. Statistical analysis is done on the objective test results and the viewing panel results to determine the quality assessment algorithm. Recently, subjective assessment of visual quality has been formalized in ITU- R Rec. 500 [68]. It suggests standard viewing conditions, criteria for selection of observers and test material, assessment procedures, and data analysis methods. It is directly applicable to still images though it is for the subjective assessment of television pictures. Two of the most commonly used methods are described, namely the Double Stimulus Continuous Quality Scale (DSCQS) and the Double Stimulus Impairment Scale (DSIS). In a DSCQS test, viewers are shown stimulus pairs consisting of a reference and a test stimulus, which are presented twice alternatively, with the order of the two chosen randomly for each trial. Subjects are not informed which is the test stimulus and which is the reference. They are asked to rate each of the two separately on a continuous quality scale ranging from bad to excellent. Data analysis is based on the difference in rating for each pair, which is calculated from an equivalent numerical scale from 0 to 100. DSCQS test has been shown to work reliably even when the quality of test and reference stimuli are rather similar, because it is quite sensitive to small differences in quality. In a DSIS test, the reference is always displayed before the test stimulus, and both are shown only once. Subjects rate the amount of impairment in the test stimulus on a discrete five-level scale ranging from very annoying to

41 3.3. Document Images in Human Eyes 28 imperceptible. DSIS is the preferred method when evaluating clearly visible impairments. 3.3 Document Images in Human Eyes Human visual perception of document images is quite different from that of natural images, which are usually continuous multiple-tune images. Document images are essentially binary and there are only two levels, black and white. On the other hand, documents are mostly consisting of characters, which are more like invented symbols/signals rather than physical objects in natural images. Their images have arbitrary nature as a result of the non-analytical artifacts of human history and culture [65]. Hence, the HVS models built for natural images, such as color and contrast, may not be well suited for document images. In turn, the perception of distortion in document images is also different from that in natural images. In a particular language, such as English, people know very well what a certain alphabetic character should look like. Hence, distortion in document images could be more obtrusive than distortion in natural images, and the distortion measures proposed for color/gray-level images [56,58 64] are not often applicable to binary document images. On the other hand, other measures discussed in Section are not designed for the measurement of visual distortion in binary document images.

42 3.4. The Distance-Reciprocal Distortion Measure The Distance-Reciprocal Distortion Measure We use a number of single-letter images to study distortion in binary document images. Each single-letter image is converted from a letter typed in MS Word with a font size of 10 or 12, including both uppercase and lowercase, using Adobe Acrobat 5.0 with a resolution of 150 dots per inch (dpi). One of them is shown in Fig. 3.1(a). We observed that for a binary document image, the distance between two pixels plays a major role in their mutual interference perceived by human eyes. As discussed above, readers are so familiar with alphabetic characters that even single-pixel distortion can be perceived easily. Therefore, the main factor in distortion perception is focusing, i.e. whether the distortion is in a viewer s focus. The distortion (flipping) of one pixel is more visible when it is in the field of view of the pixel in focus. The nearer the two pixels are, the more sensitive it is to change one pixel when focusing on the other. Further, from a magnified viewing, each pixel is essentially a black or white square. Therefore, a diagonal neighbor pixel is considered to be further away from a pixel in focus than a horizontal or vertical neighbor one. Hence, diagonal neighbors have less effect on a center pixel in focus than horizontal or vertical neighbors. Based on these observations, we propose an objective distortion measure here for binary document images. This method measures the distortion of a processed image g(x, y) compared with the original image f(x, y) using a weighted matrix with each of its weights determined by the reciprocal of a distance measured from the center pixel. We name it the distance-reciprocal distortion measure (DRDM) method. Specifically, the weight matrix W m is of size m m, m =

43 3.4. The Distance-Reciprocal Distortion Measure 30 2n + 1, n = 1, 2, 3, 4, 5,... The center element of this matrix is at (i C, j C ), i C = j C = (m + 1)/2. W m (i, j), 1 i, j m, is defined as following: W m (i, j) = 0 for i = i C and j = j C 1 otherwise. (i i C ) 2 +(j j C ) 2 (3.5) This matrix is normalized to form the normalized weight matrix W Nm. W Nm (i, j) = m i=1 W m (i, j) m j=1 W m(i, j) (3.6) The weight matrices before and after normalization are shown in Table 3.1 and 3.2 for m = 5, respectively. Table 3.1: Weight matrix before normalization (m = 5) Suppose that there are S flipped (from black to white or from white to black) pixels in g(x, y), each pixel will have a distance-reciprocal distortion DRD k, k = 1, 2, 3,..., S. For the kth flipped pixel at (x, y) k in the output image g(x, y), the resulted distortion is calculated from an m m block B k in f(x, y) that is centered at (x, y) k. The distortion DRD k measured for this flipped pixel

44 3.4. The Distance-Reciprocal Distortion Measure 31 Table 3.2: Weight matrix after normalization (m = 5) g[(x, y) k ] is given by DRD k = i,j [D k (i, j) W Nm (i, j)] (3.7) where the (i, j) th element of the difference matrix D k is given by D k (i, j) = Bk (i, j) g[(x, y) k ] (3.8) Thus, DRD k equals to the weighted sum of the pixels in the block B k of the original image that differ from the flipped pixel g[(x, y) k ] in the processed image. The pixel f[(x, y) k ] does not contribute directly to DRD k since its weight is always zero. For the possibly flipped pixels near the image edge or corner, where an m m neighborhood may not exist, it is possible to expand the rest of the m m neighborhood with the same value as g[(x, y) k ], which is equivalent to just ignoring the rest of the neighbors. After W Nm walks over all the S flipped pixel positions, we sum the distortion as seen from each flipped pixel visited to get the distortion in g(x, y) as: DRD = S k=1 DRD k NUBN (3.9)

45 3.5. Experimental Results 32 Figure 3.2: Original binary document image for subjective testing of visual distortion. where NUBN is to estimate the valid (non-empty) area in the image and it is defined as the number of non-uniform (not all black or white pixels) 8 8 blocks in f(x, y). The total pixel number M N is not used in the denominator because uniform areas (e.g. all white pixel blocks) are common in binary document images and they may have a significant effect on the distortion value if used. The proposed DRDM method provides an efficient way to measure visual distortion in binary document images. It is superior over PSNR in the sense that it takes human visual perception into account and hence correlates well to subjective assessment, which is the ultimate judge on distortion. This correlation is demonstrated by the experimental results presented in the next section. 3.5 Experimental Results We carried out experiments to test how well the distortion measure proposed is matched with human visual perception. An approach similar to that in [64] has been taken, which is quite close to the DSIS test. We designed the image shown in Fig. 3.2 to be the original binary document image used in the subjective testing. It is converted from MS Word in the same way as described in previous section, and the characters in the image are with different fixed-size fonts. The image is of size and there are 122 ( 39%) non-uniform 8 8 blocks out of 312 blocks.

46 3.5. Experimental Results 33 It is important to design a distortion generator that can generate a number of independent test images with various amount of visual distortion. The design criteria is that under the constraint that the number of flipped pixels is the same in each test image, test images generated through this distortion generator should have a wide variety in terms of how noticeable the flipping is. It has been shown through experiments that the flipping of a number of pixels randomly selected will result in images with very similar amount of visual distortion, measured both in DRD values and by human eyes. This is not desirable since it will be very hard for the observers to give a reasonable ranking. We have built a number of distortion generators and tested them against the above criteria. After careful testing, we choose a distortion generator that is designed to do random flipping in a restricted neighborhood of black pixels in the image. This generator is described below by showing its operation on the original binary document image in Fig. 3.2, which has 1763 black pixels. 40 pixels are flipped in the original image with some randomness to generate the test images with various amount of visual distortion: 1. The positions of all 1763 black pixels are recorded in a matrix black pixels out of 1763 are randomly chosen using a random number generator with uniform distribution. 3. For each black pixel chosen, one pixel is flipped in its neighboring area. As shown in Fig. 3.3, the pixel to be flipped is randomly selected from the Band1 pixel (the black pixel itself), or eight Band2 pixels, or sixteen Band3 pixels, with probability of P 1, P 2 and P 3, respectively, and P 1 +P 2 +P 3 = 1. For the Band2 and Band3 pixels, one neighbor is randomly chosen among the band pixels.

47 3.5. Experimental Results 34 Figure 3.3: Distortion generation for subjective testing. 4. A total of 60, 000 test images are generated in the experiment by running the generator 10, 000 times for each of the six sets of P 1, P 2 and P 3, with P 3 = 0, 0.2, 0.4, 0.6, 0.8 and 1, P 1 = (1 P 3 )/10 and P 2 = 9 P The images generated with the number of flipped pixels less than 40 are ignored. That is, the cases where at least one pixel is flipped more than once are dropped. Since all the test images are generated from the same original image and they have the same number of flipped pixels, they have the same MSE, SNR and PSNR. The PSNR value is 27.32dB according to (3.4). One set of the test images generated is shown in Fig. 3.4, with the corresponding PSNR and DRD values calculated with m = 5 shown below the images. Next, we divide all the test images generated into four groups according to the DRD values computed, with group 1 having smallest values, group 4 having largest values and so on. The subjective assessment is done by 60 observers. Each observer is given the original image and four sets of test images, which are printed on a piece of 80 GSM quality paper using a HP LaserJet 4100 printer. Each set of test images consists of four test images randomly chosen from the four groups. The observers are asked to rank the visual quality of the four images in each set according to the visual distortion that he or she perceives when he or she views

48 3.5. Experimental Results 35 (a) PSNR=27.32dB, DRD= (b) PSNR=27.32dB, DRD= (c) PSNR=27.32dB, DRD= (d) PSNR=27.32dB, DRD= Figure 3.4: One set of test images for subjective testing. the images at a comfortable distance under normal indoor lighting conditions in labs. A smaller ranking score indicates less distortion. There are four rankings (1, 2, 3 and 4) with score 1 for the least distortion and 4 for the most distortion perceived. The ranking scores collected from the 60 observers are analyzed and compared with the rankings according to the average DRD values (with m = 5), as shown in Table 3.3. Although the PSNR is the same for all the test images, the DRD values obtained are different for different distorted images and their average values for the four groups have a normalized correlation of with the mean subjective rankings, indicating a very good match between our objective measure and the subjective evaluation. We choose m = 5 here to demonstrate the high correlation between our measure and the subjective rankings. Based on our experimental data, it was found that the correlations calculated using various values (3, 5, 7,..., 15) of m are quite close. The DRD values for a larger m have a slightly lower correlation with the subjective measure.

49 3.6. Chapter Summary 36 The distribution of the subjective ranking scores for each group is shown in a sub-figure in Fig In each sub-figure, the abscissa represents four ranking scores (1, 2, 3 and 4), and the ordinate shows the counts of the corresponding ranking scores given by the 60 human evaluators. Since each of the 60 observers is given four sets of test images, there are 240 scores in total for each group. Table 3.3: Experimental results of the subjective testing for the proposed distortion measure Test Mean PSNR Average DRD Images Subjective Rank (db) (m=5) Group Group Group Group Chapter Summary Performance of many image processing applications such as digital image watermarking is closely related to human visual perception of distortion resulted. In this chapter, we propose an objective distortion measure for binary document images. This measure is derived from our observation that for binary document images, the distance between pixels plays a major role in their visual interference, and it is called the distance-reciprocal distortion measure. Experimental results have shown its high correlation with the subjective assessment and proved that DRDM has a better match with human visual perception than MSE, SNR, or PSNR. This measure is useful in a wide range of applications involving visual distortion in digital binary document images, such as watermarking, data hiding and lossy compression.

50 3.6. Chapter Summary 37 Figure 3.5: Distribution of subjective ranking scores. However, this distortion measure is not suitable for halftone (dithered) images, in which black and white pixels are well interlaced and graininess is desired. In this case, an analogous measure could be developed with the weight matrix elements proportional to the distance rather than its reciprocal.

51 Chapter 4 Fragile Watermarking Based on DRDM This chapter presents a fragile watermarking algorithm for binary document images based on the distance-reciprocal distortion measure proposed in previous chapter. In watermark embedding, DRDM is used to evaluate the amount of visual distortion caused by flipping a pixel in binary document images and the pixels that will cause less visual distortion after flipping are preferred candidates for flipping. We do the embedding by enforcing the odd-even features of nonuniform blocks and we employ a 2-D shifting to provide security for tamper proofing and authentication. More security can be provided by keeping the processing block size secret. Experimental results show that the watermarked binary document image has good quality and tampering of the content can be detected successfully.

52 4.1. Introduction Introduction In [13], M. Wu et al. proposed a data hiding algorithm for digital binary images. A set of rules are used to calculate the flipping scores of pixels considering the change in smoothness (measured by horizontal, vertical and diagonal transitions) and connectivity (measured by the number of black and white clusters) in the 3 3 window centered at a pixel. A look-up table containing the scores is built for use in flipping. Shuffling is employed to handle the problem of uneven embedding capacity. Such rule-based approach becomes very complicated when we want to consider a larger neighborhood for better discrimination. The distance-reciprocal distortion measure proposed in previous chapter for binary document images provides a new way to choose pixels based on the visual distortion resulted when they are flipped. This measure has been shown to have good correlation with human visual perception. Moreover, it has a simple form and it is straightforward to calculate for a window of any size. Therefore, we propose a secure fragile watermarking algorithm based on DRDM for the purpose of authentication of digital documents in a binary image format. We propose a 2-D shifting technique to provide security against malicious tampering and employ a simple odd-even embedding scheme for watermark embedding. The processing block size can be kept secret to provide more security. The DRDM method is used to choose the appropriate pixels to flip. Experiments show that the algorithm has good imperceptibility and can be used for tamper proofing and authentication of binary document images. When using DRDM in watermark embedding, we handle the cases for possibly flipped pixels near the corners or borders, where an m m neighborhood may not exist, in a slightly different way. We choose to expand the rest of m m neighbors with the value of the background color of the image so that in such

53 4.2. Effects of Varying Weight Matrix Size 40 areas, a pixel flipped to the background color has small distortion while a pixel flipped to the opposite color of the background, which is likely to be obvious against the background, has large distortion. 4.2 Effects of Varying Weight Matrix Size We studied the effects of varying the weight matrix size m on the distribution of pixels (assuming flipped) with a smaller DRD in the non-uniform 8 8 blocks. We use the test text image of in Fig. 4.3(a) to demonstrate the effects. Fig. 4.1 shows the distribution of pixels (assuming flipped) with DRD less than 0.5 for the non-uniform 8 8 blocks of the image in Fig. 4.3(a), with weight matrix size m = 3, 7 and 11. We can observe that as m increases, the number of pixels with DRD less than 0.5 tends to distribute more evenly. Thus, the problem of uneven embedding capacity [13] becomes much less serious for a larger m. Not including the uniform (all black/white) blocks also helps to reduce this problem. In [13], random shuffling is employed to handle the problem of uneven embedding capacity among blocks. However, it is not guaranteed that every block has at least one flippable pixel, as defined by the authors. This is also why the authors choose a larger processing block of size Although the embedding capacity is evened, the total embedding capacity is limited since smaller processing blocks may not be suitable. A smaller processing block size could result in blocks with no flippable pixels, which is a typical problem of uneven embedding capacity. For those images in which empty space is dominant, such as the CCITT2 binary test image, to ensure that every block has at least one flippable pixel, a block size larger than might be necessary. Thus, for the algorithm in [13], a suitable block size may need to be determined from the

54 4.3. Fragile Watermarking Based on DRDM 41 Figure 4.1: Distribution of pixels (assuming flipped) with DRD < 0.5 for the non-uniform 8 8 blocks of the image shown in Fig. 4.3(a). input image. On the other hand, we note that in the scoring system proposed in [?, 13], there are only 7 distinct flippable scores for a 3 3 neighborhood. In comparison, using DRDM, there are 25 possible DRD values for m = 3. The number of possible DRD values for a single flipped pixel grows rapidly as m increases. The larger this number is, the less pixels will have the same DRD value in a particular image, which makes it easier to discriminate and to choose the right pixels to flip. 4.3 Fragile Watermarking Based on DRDM Fig. 4.2 shows the system flow of the proposed fragile watermarking algorithm for binary document images. We embed the watermark W into the original binary image f(x, y) using a shift key K s to obtain the output image g(x, y).

55 4.3. Fragile Watermarking Based on DRDM 42 The image is of size M N and the processing block size is 8 8. A smaller block size will provide larger embedding capacity while resulting in greater visual distortion (poorer quality), and vice versa. On the other hand, the sensitivity to tampering increases as the processing block size decreases. To be more secure, we can keep this information (block size) secret. The watermark W can be any plain or encrypted digital information, such as digitized signatures, faces or fingerprints D Shifting To embed the watermark, we first obtain f s (x, y), a shifted version of f(x, y), using K s. We do a circular left-shifting of N s rows each time for all M rows first and then N s columns each time for all N columns. The amount of shifting is determined by L s bits from the key. Hence, K s is of length L ks = L s ( M/N s + N/N s ) (4.1) where is the ceiling function, 1 N s min(m, N), and 1 L s log 2 [max(m, N)]. The 2-D shifting using K s provides security for the simple odd-even embedding strategy [13] that we will adopt. The odd-even embedding strategy is vulnerable when it is applied alone since an adversary can modify the watermarked image without affecting the odd-even features [69]. The 2-D shifting makes it difficult to learn the watermark from the odd-even features of the watermarked image without the knowledge of the key, and an adversary can hardly modify the watermarked image without affecting the odd-even features of the shifted version in extraction. The security level provided by this shifting increases as L s increases, as N s

56 4.3. Fragile Watermarking Based on DRDM 43 decreases, and as the size of the original image increases. For a ( ) image, the maximum number of possible ways of shifting is = 2 36 = 68, 719, 476, 736 (when L s = 9 and N s = 1) since there are 2 9 (maximal) ways to shift each row or column. The amount of shifting depends on the shift key only and the shifting of different N s rows/columns is independent. Although it is possible for an adversary to try all possible ways of shifting, especially for a smaller L s and a larger N s, we can use encrypted watermark so that he/she will not be able to know which way of shifting is correct since the encrypted watermark message appears to be random and does not convey any meaningful information without correct decryption. Thus, an adversary can only randomly assume a possible shifting and has little chance to succeed. On the other hand, while we can only answer whether the test image is tampered or not when using a random shuffling [13], the 2-D shifting makes it possible to localize the tampering, i.e., to answer which part of the test image has been tampered when tampering is detected. There is a trade-off between the security level provided and the capability of tampering localization. We tend to have better (more precise) tampering localization but lower security level for a smaller L s, and vice versa Watermark Embedding We embed the watermark by enforcing the odd-even features of the non-uniform 8 8 blocks in f s (x, y), with an even number of black pixels in the block for embedding the bit 0 and an odd number for the bit 1. We skip the uniform blocks in f s (x, y) to preserve the quality of the image after embedding. The reason is that since empty areas (uniform blocks) are common or even dominant in binary document images, pixels in the uniform blocks of f s (x, y) is likely to

57 4.3. Fragile Watermarking Based on DRDM 44 be in the uniform blocks of f(x, y) too, especially for a small shifting amount. An alternative approach of the odd-even embedding is to force the number of black pixels in a block to be 2kQ to embed a 0 and to be (2k + 1)Q to embed a 1 as mentioned in [13], where Q is a quantization step size and Q = 1 for the odd-even embedding strategy. For a larger Q, this approach may provide some robustness while reducing the capability of tamper-proofing and resulting in greater visual distortion. Hence, Q = 1 is the best choice for fragile watermarking. Such embedding strategy in concept belongs to the class of quantization index modulation (QIM) methods [70]. Flipping of appropriate pixels is necessary when the bit to be embedded does not match with the embedding block s odd-even feature. For example, when the number of black pixels in the embedding block is odd (even) while the bit to be embedded is 0 ( 1 ). M. Wu et al. [13] use a set of predefined rules or a precalculated look-up table to choose appropriate pixels to flip. While this is not difficult for the 3 3 patterns, it becomes quite complicated when we want to consider a larger area of neighborhood for better discrimination. DRDM provides a fast and accurate way to evaluate the visual distortion resulted from flipping a pixel. It is able to consider an arbitrarily large neighborhood by choosing a large m, and it also correlates well with human visual perception. Therefore, when flipping is necessary, we choose the pixel (assuming flipped) with the lowest DRD value in the embedding block to flip. We obtain the odd-even features from the shifted image f s (x, y), while for the blocks that need flipping, we calculate the DRD values in the original image f(x, y) to minimize the visual distortion in f(x, y) rather than in f s (x, y). Necessary flipping is done in both f(x, y) and f s (x, y) so that there is no need to do reverse shifting after the embedding. The shift key K s is used to find the mapping from the pixels in f s (x, y) to those corresponding ones in f(x, y).

58 4.3. Fragile Watermarking Based on DRDM 45 (a) Watermark embedding. (b) Watermark extraction and authentication. Figure 4.2: The proposed fragile watermarking algorithm for binary document image based on DRDM. However, it is possible that the flipping changes the current embedding block in f s (x, y) from a non-uniform block to a uniform one. Whenever this happens, we record this change in a counter, do the flipping in both f(x, y) and f s (x, y), and postpone the embedding of the current bit to the next non-uniform block in f s (x, y). Thus, this block becomes a uniform block in f s (x, y) after flipping and it is skipped in watermark extraction. We denote N(p) as the number of blocks changed from non-uniform ones to uniform ones when we embed W (p), the p th bit of W, where p = 1, 2, 3,..., L W and L W is the length of W. Thus, the bit W (p) is embedded into the q th nonuniform 8 8 block in f s (x, y) and q = p + N(p). After all necessary flipping in f(x, y) to embed W, we have the watermarked binary image g(x, y). The total number of blocks changed from non-uniform ones to uniform ones in f s (x, y) is denoted as N r.

59 4.4. Experimental Results Embedding Capacity We denote the number of non-uniform 8 8 blocks in f s (x, y) as N fs, and the number of 8 8 blocks in f s (x, y) with all except one pixels having the same pixel value as Q s. Then, the guaranteed embedding capacity C g is given by C g = N fs Q s (4.2) In practice, we can still embed (Q s N r ) more bits. From the discussion above, we have 0 N r Q s Watermark Extraction Watermark extraction is performed in a similar way as embedding. To extract the watermark Ŵ from a test binary image g(x, y), we do a 2-D circular leftshifting using K s to obtain g s (x, y). The odd-even features in all non-uniform 8 8 blocks of g s (x, y) are extracted as Ŵ. This extracted watermark Ŵ is then compared with the original watermark W to give the authentication result, and Ŵ = W only when g(x, y) is an intact watermarked image. 4.4 Experimental Results We use the image in Fig. 4.3(a) as the original image f(x, y). For programming convenience, we choose L s = 3 and N s = 8 in our experiments. Thus, L ks = 384 and there are 4096 ways of shifting for this pair of L s and N s, which should be kept secret together with the shift key so that these information is not available to an adversary. We use the weight matrix W N7 (m = 7) in embedding and it needs to be calculated only once. There are 2512( 61%) non-uniform 8 8 blocks out of 4096 in f(x, y). After the 2-D shifting using a randomly generated

60 4.4. Experimental Results 47 key of length 384, we obtain f s (x, y) with N fs = 2539 and Q s = 53. Therefore, the guaranteed embedding capacity C g is 2486 bits, and we generate a random binary sequence of length L W = 2486 as W for embedding. In the embedding, we have N r = 23 ( 0.9%) after necessary flipping. Thus, we embed 30 more bits in the experiment and have 2516 bits of data embedded successfully. The image after embedding is shown in Fig. 4.3(b) and there are 1247 pixels flipped in the image. From the figures, we can see that the image after embedding has good quality and flipping is hard to perceive. The PSNR is 23.23dB and the DRDM measure DRD is with m = 5. To show how effective DRDM scheme is in choosing most suitable pixels to flip, we cut the center portion of the original image, as shown in Fig. 4.4(a), and the pixels flipped in this portion of the watermarked image are shown in Fig. 4.4(b) as black dots, with the original black pixels brightened. The flipped pixels are all on the contours of the characters and the flipping has little effects on the visual quality. We also simulate an active tampering to test the tamper proofing ability of our algorithm, which is important for authentication. We modify the content of the watermarked image with the processing block size known and without the knowledge of L s, N s and K s, and try to detect the modification through the watermark extracted. If the block size is also kept secret, tampering becomes more difficult for an adversary and the algorithm becomes more secure. The tampering is a simple modification followed by a processing that forcing the uniform/non-uniform property and odd-even features to agree with the original watermarked image. The image after such a simulated attack is shown in Fig. 4.5(a). We modify the number 300 (second word) at the fifth row (without counting the top

61 4.4. Experimental Results 48 (a) Original binary document image. (b) Watermarked binary document image. Figure 4.3: Original and watermarked binary document images for the proposed fragile watermarking method based on DRDM.

4.4. Experimental Results 49 (a) Original center portion. (b) Flipped pixels. Figure 4.4: Flipped pixels for the center 100 100 portion of Fig. 4.3(b).

62 4.4. Experimental Results 49 (a) Original center portion. (b) Flipped pixels. Figure 4.4: Flipped pixels for the center portion of Fig. 4.3(b). row of title) to 390, and the second word le at the third last row (without counting the bottom half row) to la. If the 2-D shifting is not used in the embedding, we will extract a watermark Ŵ that is exactly the same as W, showing that the test image is an intact watermarked image, which means that the tamper-proofing fails. The 2-D shifting makes such attack detectable as shown in Fig. 4.5(b) since the modification has changed the odd-even features of the shifted version of the watermarked image though such features are intact in the version before shifting. There are 10 bits error in the extraction for the example shown, which implies that the tampering has affected the odd-even features of at least 10 blocks in the shifted version. If the number of non-uniform blocks in the shifted version of the test image after attack does not agree with that of the intact watermarked image, the length of the detected watermark will be different from the length of the embedded watermark and it is a clear indication that the test image is tampered. However, localization of the tampering is more challenging in this case since the synchronization between the original watermark and the watermarked image is corrupted.

63 4.4. Experimental Results 50 (a) Tampered image. (b) Tampering detected. Figure 4.5: Tamper proofing of the tampered image.

64 4.5. Chapter Summary Chapter Summary A secure fragile watermarking algorithm for binary document images is proposed in this chapter. This algorithm is based on the distance-reciprocal distortion measure that provides an efficient way to select pixels to flip in embedding. The visual distortion due to flipping is calculated online using a pre-calculated weight matrix of size m by m and we can take the effect of a large area of neighbor pixels into accounts by choosing a larger m. Watermark is embedded by enforcing the odd-even features of non-uniform blocks and a 2-D shifting technique is employed to provide security against tampering. Keeping the processing block size secret can make the proposed algorithm more secure. Experiments show that the watermarked image has good quality and tampering can be detected and even localized in the extraction process.

65 Chapter 5 Watermarking through Biased Binarization In this chapter, the approach of DC component embedding in the Discrete Cosine Transform (DCT) domain proposed in [71] is analyzed and validated. From the insights in the derivation and the understanding of the approach, a new watermarking algorithm is proposed to embed a watermark through biasing the binarization threshold directly and it adopts a blurring preprocessing from [71]. The new algorithm is much simpler than the one in [71] and it provides more flexibility in controlling the quality of watermarked images and the robustness. Some robustness against cropping and additive noise has been shown in the experiments.

66 5.1. Watermark Embedding in DC Components of DCT Watermark Embedding in DC Components of DCT In [71], an algorithm is developed for binary images based on the embedding strategy proposed by J. Huang et al. in [72] for multi-level images. The original binary image is blurred through a low-pass filtering to get a multi-level image and then the watermark is embedded in the DC components of the DCT of the blurred image, followed by an inverse DCT (IDCT) and a binarization process with a biased binarization threshold. However, only experimental results are presented here and the watermark used is a random number sequence with a Gaussian distribution rather than a digital bitstream representing meaningful information. Also, the watermark extraction process requires the presence of the original binary image. The attempt of this section is to provide an analysis on the feasibility of the DC component embedding for binary images and to show that watermark embedding in DC components is impossible for binary images if the embedding is done directly on binary images, or the binarization threshold is chosen to be simply the mid-point, i.e. the mean of the maximum and minimum intensities. Furthermore, the watermarking algorithm for binary images proposed in [71], including a blurring pre-processing and a post-embedding binarization with a biased threshold, is analyzed to demonstrate why the watermark can survive even after binarization.

67 5.1. Watermark Embedding in DC Components of DCT The Feasibility of DC Components Embedding As defined in [73], an image f(x, y) of size N N can be represented by its inverse DCT (IDCT) as following: f(x, y) = N 1 u=0 N 1 v=0 { ρ(u)ρ(v)c(u, v) cos where C(u, v) is the DCT coefficient and ρ is given as [ ] [ ]} (2x + 1)uπ (2y + 1)vπ cos 2N 2N (5.1) 1/N for u = 0 ρ(u) = (5.2) 2/N for u = 1, 2,..., N 1. Let S AC (x, y) be the sum of the contributions from all AC components (all values of u, v except u = v = 0), then f(x, y) = [ρ(0)] 2 C(0, 0) + S AC (x, y) (5.3) where the DC component C(0, 0) is defined as [73]: N 1 C(0, 0) = [ρ(0)] 2 x=0 N 1 y=0 f(x, y) = N 1 x=0 N 1 y=0 f(x, y) N (5.4) Suppose that only the DC component is modified to C (0, 0) to embed a watermark and the image after modification of f(x, y) is f (x, y). Thus, f (x, y) = [ρ(0)] 2 C (0, 0) + S AC (x, y) (5.5) Therefore, the change due to the modification is f(x, y) = f (x, y) f(x, y) = [ρ(0)] 2 C(0, 0) (5.6)

68 5.1. Watermark Embedding in DC Components of DCT 55 where C(0, 0) = C (0, 0) C(0, 0) (5.7) From the above equations, f(x i, y j ), the change in the intensity of a particular pixel f(x i, y j ), is a constant for all pixels in the image f(x, y) and it is independent of its position (x i, y j ). Therefore, after modification, f (x, y) = f(x, y) + const (5.8) For binary image watermarking, when f (x, y) is not binary, it has to be binarized using a binarization threshold T bi to obtain the watermarked binary image g(x, y). Usually uniform (all black/white) image blocks are not considered to be watermarked for imperceptibility. We observe the following two properties regarding binary image watermarking through DC component modification. Property 1. For a binary image f(x, y) such that f(x i, y j ) {0, 1}, where i, j = 0, 1,..., N 1, a watermark embedded in f(x, y) through DC modification cannot survive as long as the binarization threshold chosen is between the maximum intensity (I max ) and the minimum intensity (I min ) in f(x, y). Proof. From (5.8) above, if f(x, y) is binary, f (x, y) {const, const + 1} has only two levels too and for any binarization threshold T bi between I min (const) and I max (const + 1), const < T bi < const + 1, the watermarked image 0 if f(x i, y j ) = 0 g(x i, y j ) = 1 if f(x i, y j ) = 1. This means that the watermarked binary image is identical to the original binary image and the watermark embedded is removed completely. For other T bi, T bi const + 1 or T bi const, g(x, y) becomes a uniform black/white image,

69 5.1. Watermark Embedding in DC Components of DCT 56 which is not acceptable for imperceptibility. Property 2. For a gray-level image f(x, y), DC component modification has no effect on the image after binarization if a mid-point threshold is used. Proof. If a gray-level image f(x, y) is to be binarized using the mid-point threshold, T bi = (I max + I min )/2, where I max and I min are the maximum intensity and minimum intensity in f(x, y), respectively, from (5.8), the maximum and minimum intensities in f(x, y) and f (x, y) have the similar relation too: I max = I max + const, I min = I min + const and the mid-point threshold T bi is calculated as: T bi = (I max + I min )/2 = (I max + I min )/2 + const = T bi + const (5.9) We can see that f (x i, y j ) > T bi f (x i, y j ) = T bi f (x i, y j ) < T bi if if if f(x i, y j ) > T bi f(x i, y j ) = T bi f(x i, y j ) < T bi Therefore, if a mid-point threshold is used, doing binarization before or after the modification of DC component has the same binary output image, which means that the watermark embedding is meaningless and the embedding will fail.

70 5.1. Watermark Embedding in DC Components of DCT 57 Figure 5.1: DC components embedding for binary images by X. Shi (2001) Successful Watermark Embedding in DC Components Based on the two properties derived above, we analyze the DC components watermark embedding system designed in [71] to see why it works. As shown in Fig. 5.1, the watermarking algorithm proposed in [71] for binary images is similar to that proposed in [72], except that there are a preprocessing that blurs the input binary image to a gray-level image, and a postprocessing that binarizes the image after embedding to a binary image. These two processing are described below. The blurring pre-processing is to obtain a gray-level image from the input binary image. A low-pass filtering works fine for this purpose. This pre-processing is necessary to avoid the failure of watermark embedding in DC components due to Property 1 discussed in the previous section. The post-embedding binarization is another critical step for successful watermarking. Binarization is to ensure that the watermarked image is still a binary image to keep the image file size small. As discussed above, a mid-point threshold will lead to embedding failure. Therefore, it is necessary to find a suitable threshold method such that the watermark embedded can survive after the binarization, yet the distortion resulted is not obtrusive. The work in [71] succeeds by introducing a bias B bi in determining the binarization threshold. Below we give a brief description of the watermarking algorithm.

71 5.1. Watermark Embedding in DC Components of DCT The original image f(x, y) of size M N is low-pass filtered using a Gaussian filter with a window of size 5 5 and a standard deviation of 1 to obtain the blurred version f b (x, y). This blurred image is then split into non-overlapped blocks of The non-uniform 8 8 blocks in the original image f(x, y) are identified and those uniform blocks (all black/white) will be skipped in embedding for imperceptibility. Denote each block in f b (x, y) corresponding to the non-uniform blocks in f(x, y) as f b k (r, s), r, s = 0, 1,..., 7, and k = 0, 1,..., N f 1, where N f is the number of non-uniform 8 8 blocks in f(x, y). 3. Each block f b k (r, s) is DCT transformed to get C b k (u, v) = DCT {f b k (r, s)}, 0 u, v < 8 (5.10) The watermark W G with length L W = N f is a random number sequence with Gaussian distribution N(0, 1). The watermark is embedded one element per block by modifying the DC value in C b k (u, v) as: C C b b k (u, v) (1 + α W G (k)) if u = v = 0 k(u, v) = C b k (u, v) otherwise. (5.11) where α is a scaling factor. 4. The image block is IDCT transformed to obtain f b k (r, s), the gray-level image block after embedding: f b k(r, s) = IDCT {C b k(u, v)} (5.12)

72 5.1. Watermark Embedding in DC Components of DCT 59 This gray-level image block is then binarized to obtain the watermarked binary image block f bb k (r, s) using a biased threshold T bib k. 0, if f f bb b k k(r, s) = (r, s) < T bib k (5.13) 1, if f b k (r, s) T bib k. and T bib k = (I max k + I min k) (0.5 B bi ) (5.14) where I max k and I min k are the maximum and minimum intensities in the block image f b k (r, s), respectively, and B bi is the bias in the binarization (0 < B bi < 0.5). A large value of B bi will result in better robustness while poorer image quality, and vice versa. Therefore, we prefer a small value of B bi for less visual distortion. Experimentally, the typical choice of B bi is for text document images. 5. The whole watermarked image g(x, y) is then obtained by replacing the N f non-uniform 8 8 blocks in f(x, y) with the modified ones. In the watermark detection for a test binary image g(x, y), the original binary image f(x, y) is required. From f(x, y), the non-uniform 8 8 blocks are identified and the corresponding blocks in g(x, y) are used to extract the watermark ˆ W G. The original image is blurred using the same Gaussian filter to get f b (x, y) for the extraction. Let Ck (u, v) denote the DCT of the corresponding block g k (x, y), then ˆ W G (k) = C k(0, 0) C b k (0, 0) (5.15)

73 5.1. Watermark Embedding in DC Components of DCT 60 The normalized correlation between whether g(x, y) is a watermarked copy. where of corr( ˆ W G, W G ) = ˆ W G (k) = ˆ W G (k) Nf 1 Nf 1 k=0 ˆ W G and W G is calculated to determine k=0 ( W ˆ G (k) W G (k)) (5.16) Wˆ G (k) 2 N f 1 k=0 W G (k) 2 ˆ W G has a zero mean with ˆ W G denoted as the mean ˆ W G. A correlation/similarity threshold T wm is chosen based on some experimental results to make a decision. true watermark W G if corr( ˆ W G, W G ) > T wm. ˆ W G is classified as a corrupted version of the To see why the watermark can survive, the effects of the bias in the binarization need to be examined. Suppose that T mid k is the mid-point of f b k (r, s) and T mid k is the mid-point of f b k (r, s) after the embedding of W G(k). From (5.6), (5.7), (5.9) and (5.11), we have T mid k = T mid k + [ρ(0)] 2 [C b k (0, 0) C b k(0, 0)] = T mid k + (1/8) [α W G (k) C b k (0, 0)] = T mid k + [α C b k (0, 0)/8] W G (k) (5.17) = T mid k + D k W G (k) where D k = α C b k (0, 0)/8 is a constant for the image block. From (5.14), T bib k = (I max k + I min k ) (0.5 B bi) = (I max k + I min k )/2 (I max k + I min k ) B bi (5.18) = T mid k 2 T mid k B bi Suppose we have D k W G (k) > T mid k. If W G (k) > 0, then D k W G (k) > T mid k, and from (5.17), T mid k > 0. According to (5.18), the binarization threshold is lowered by 2 T mid k B bi. Smaller threshold raises the probability of

74 5.1. Watermark Embedding in DC Components of DCT 61 increased number of 1 s (white pixels) after the binarization, which results in an increased DC value according to (5.4). Similarly, if W G (k) < 0, then D k W G (k) < T mid k, T mid k < 0, and the binarization threshold is raised by 2 T mid k B bi. Larger threshold raises the probability of increased number of 0 s (black pixels) after the binarization, which results in a decreased DC value. Therefore, a positive W G (k) tends to raise the DC value and a negative W G (k) tends to lower it. Furthermore, the larger the value of W G (k) is, the stronger such tendencies will be. This explains why the watermark embedded can still survive although not all W G (k) may result in a DC value change and succeed in the embedding. On the other hand, larger magnitudes of α and B bi will strengthen such tendencies too, while causing more visual distortion at the same time. We compute the probability of D k W G (k) > T mid k to get P [ D k W G (k) > T mid k ] = P [ (α/8) C b k (0, 0) W G (k) > T mid k ] = P [ W G (k) > (8/α) T mid k /C b k (0, 0) ] Let I max k and I min k denote the maximum and minimum intensities in the block f b k (r, s), respectively. We have T mid k = (I min k + I max k )/2 and from (5.4) C b k (0, 0) = ( 7 r=0 7 s=0 f b k(r, s))/8 (63 I min k + I max k )/8 (62 I min k + 2 T mid k )/8 In a binary image, a pixel value is either 0 or 1. After a low-pass filtering, the blurred image will have pixel values ranging from 0 to 1. Therefore, f b k (r, s) 0 and I min k 0. Then we have C b k (0, 0) (2 T mid k )/8 (T mid k )/4. Hence, 0 T mid k /C b k (0, 0) 4.

75 5.1. Watermark Embedding in DC Components of DCT 62 Figure 5.2: The probability of D k W G (k) > T mid k against T mid k /C b k (0, 0) for α = 90. Since W G has a normal distribution, when a large α (= 90) is selected, we can get a high probability of D k W G (k) > T mid k, indicating that the watermark has high possibility to be embedded successfully. Fig. 5.2 shows the variation of this probability against the full range of T mid k /C b k (0, 0). It should be noted that in real images, the values of T mid k /C b k (0, 0) are concentrated at the lower range, as shown in the histogram in Fig. 5.3, which shows the distribution of T mid k /C b k (0, 0) among the non-uniform 8 8 blocks of all the eight CCITT standard binary test images Experimental Results The algorithm has been tested on the text image of as shown in Fig. 5.4(a). This test image is watermarked using the algorithm described above with α = 90 and B bi = for a balance between the robustness and imperceptibility, and the watermarked image is shown in Fig. 5.4(b). There are 2512 ( 61.3%) non-uniform blocks available for watermark embedding. Hence

76 5.1. Watermark Embedding in DC Components of DCT 63 Figure 5.3: The count of the non-uniform 8 8 blocks in the eight CCITT binary test images over the range of T mid k /C b k (0, 0). the watermark length L W = In the embedding, 2488 out of 2512 ( %) blocks satisfy D k W G (k) > T mid k, which agrees with our analysis in previous section. In the watermarked image, 4182( 1.595%) pixels are flipped. We can find through visual inspection that some of the holes in some characters, such as a, e, g and s, are filled completely by the embedding process, which is not desired. The detector response in Fig. 5.5 shows the correlations between the extracted watermark and the true or false watermarks. The 500th watermark is the true one and all the others are false watermarks randomly generated. We can see that the visual distortion is not obtrusive and the detector gives a quite strong response, after undergoing the binarization process, which is a very strong interference as pointed out in [8]. From the detector response, the threshold T wm in the detection is safely set to 0.1.

77 5.1. Watermark Embedding in DC Components of DCT 64 (a) Original binary image. (b) Watermarked binary image. Figure 5.4: Original and watermarked binary images for the method by X. Shi (2001).

78 5.2. Watermarking through Biased Binarization without Using DCT 65 Figure 5.5: Detector response for the watermarked image of Fig. 5.4(b). 5.2 Watermarking through Biased Binarization without Using DCT Based on the insights from the analysis on the DC components embedding algorithm presented in previous sections, we can see that the DC component modification is a spatial domain technique in effect and the modification of DC components is equivalent to adding constant values to all pixels in the spatial domain. Hence we propose a more efficient watermarking algorithm based on this observation. In our new approach, the original binary image is blurred to a gray-level image to enable embedding as in [71]. However, DCT and IDCT (inverse DCT) are not involved, which greatly simplifies the watermarking procedure. The embedding is done by using the watermark information to bias the binarization threshold directly, i.e., the binarization procedure is the same as the embedding procedure, unlike in [71], where the binarization procedure follows the embedding procedure. A loop is used to control the quality of the watermarked binary

79 5.2. Watermarking through Biased Binarization without Using DCT 66 (a) Watermark embedding. (b) Watermark extraction. Figure 5.6: The proposed watermarking method using biased binarization. image and robustness. A feature vector is extracted as a key to be used in watermark extraction so that the original binary image is no longer necessary in the extraction. The watermark is a digital bitstream representing any kind of digital information rather than a random number sequence. To improve the extraction accuracy, the watermark is coded in error correction code (ECC). Experiments show that the visual distortion in the watermarked image is not obtrusive and the algorithm provides some degree of robustness against cropping and additive noise. Fig. 5.6 shows the system flow of the proposed algorithm. We embed the watermark W into the original binary image f(x, y) of size M N to obtain the output image g(x, y) Blurring We have shown that it is not possible to successfully embed a watermark directly into a binary image through binarization no matter what threshold is

80 5.2. Watermarking through Biased Binarization without Using DCT 67 used. Blurring is necessary to produce pixels with various intensities so that the embedding can succeed through biased binarization. We have observed that the algorithm in [71] is likely to produce obtrusive noises near the edges of the image. The cause is that when the 5 5 Gaussian low-pass filter processes the pixels near the image edges where the 5 5 window is only partially within the image, the part of the window that is outside of the image is treated as with pixel value 0 (black). However, black pixels near the image edges are obvious against a white background. To solve this problem, we expand the original image by 2 pixels with the background color (white) for the four image edges before blurring. Thus, the original image f(x, y) is expanded with white pixels (two at each image edge) to an image f e (x, y) of size (M + 2) (N + 2). The expanded image f e (x, y) is then low-pass filtered using a Gaussian filter with a window of size 5 5 and a standard deviation of σ to produce f eb (x, y), from which the blurred version f b (x, y) is obtained by ignoring two pixels at each image edge. We choose σ = 0.7 in our experiments (smaller than the value used in [71]). A larger σ offers better robustness but poorer quality, and vice versa. The blurred image f b (x, y) is then split into non-overlapped blocks of Watermarking through Direct Biased Binarization As usual, we skip the blocks in f b (x, y) corresponding to the uniform (all black/white) blocks in f(x, y) to preserve the quality of the image after embedding. The watermark is embedded by binarizing the blocks in f b (x, y) that correspond to the non-uniform 8 8 blocks in f(x, y) with biased thresholds. We denote the number of non-uniform 8 8 blocks in f(x, y) as N f and each corresponding block in f b (x, y) as f b k (r, s), where r, s = 0, 1,..., 7, and

81 5.2. Watermarking through Biased Binarization without Using DCT 68 k = 0, 1,..., N f 1. The watermark W is a bit stream of 0 s and 1 s, instead of a random number sequence as in [71], and it is encoded with BCH(31,6) [52] to reduce the extraction error. The coded watermark W c is of length L W c N f. As mentioned, the algorithm in [71] requires the original image in watermark extraction, which may not be convenient in practice. In our proposed algorithm, we eliminate this limitation by extracting a key to be used in the extraction. This key, K N, is extracted as the number of white pixels in each block of f(x, y) (both uniform and non-uniform) and it is of length L KN, where L KN is equal to the total number of 8 8 blocks in f(x, y). For each block f b k (r, s), the maximum and minimum intensities are I max k and I min k, respectively. The initial value of the binarization bias B k depends on the watermark signal W c (k) as following: 0.05, if W c (k) = 1 B k = 0.05, if W c (k) = 0. (5.19) This bias is adjusted through a loop to control the amount of visual distortion and the robustness, as shown in Fig In the figure, C L is a counter and it is initialized to 0. The maximum number of iterations is limited to M L. The binarization threshold T b k is calculated as following: T b k = (I max k + I min k ) (0.5 B k ) (5.20) Thus, bit 1 in W c will lower the threshold and bit 0 in W c will raise it. The

82 5.2. Watermarking through Biased Binarization without Using DCT 69 block f b k (r, s) is then binarized to g k (r, s) using T b k : 0, if f b k (r, s) < T b k g k (r, s) = 1, if f b k (r, s) T b k. (5.21) Therefore, a lowered threshold (by bit 1 in W c ) tends to increase the number of white ( 1 s) pixels and a raised threshold (by bit 0 in W c ) tends to reduce it. We denote the number of white pixels in f k (r, s) and that number in g k (r, s) as D f k and D g k, respectively. For successful embedding, we need to have the following condition: D g k < D f k, if W c (k) = 0 D g k > D f k, if W c (k) = 1. (5.22) If this condition is not satisfied after the biased binarization, we increase the bias to λ I B k (λ I > 1) until the condition is satisfied or C L M L. On the other hand, when the condition in (5.22) is satisfied, it is possible that the visual distortion in the block is too much. Hence, for better quality, we should reduce the amount of bias when this happens. We denote the maximum acceptable number of flipped pixels in a block as M c. Then we reduce the bias to λ D B k (0 < λ D < 1) if D g k D f k > M c. A larger M c provides better robustness while resulting in poorer visual quality. In our experiments, we choose λ I = 1.5 and λ D = 0.5. Inappropriate values of λ I and λ D may require more iterations to reach satisfactory results. In case that the condition in (5.22) is not satisfied after the loop, we keep g k (r, s) = f k (r, s), where f k (r, s) is the corresponding non-uniform 8 8 block in f(x, y). The watermarked binary image g(x, y) is obtained by replacing f k (r, s)

83 5.2. Watermarking through Biased Binarization without Using DCT 70 Figure 5.7: Loop to control the embedding in the proposed watermarking algorithm using biased binarization.

84 5.2. Watermarking through Biased Binarization without Using DCT 71 in f(x, y) with g k (r, s) Watermark Extraction Watermark extraction is a simple process. To extract watermark Ŵc from a test binary image g(x, y), we split g(x, y) into 8 8 blocks. The key K N is required and each element in K N corresponds to an 8 8 block in g(x, y). If the element from K N is either 0 or 64, the corresponding block in g(x, y) is skipped since there is no embedding in uniform blocks. Otherwise, a 1 is extracted if the number of white pixels in the block is greater than the value of the element from K N, and a 0 is extracted otherwise Experimental Results The original image f(x, y) is shown in Fig. 5.8(a). Its size is As previously shown, there are 2512( 61%) non-uniform 8 8 blocks out of 4096 in f(x, y). Thus, the key length L KN = 4096 and N f = We choose M L = 20 and M c = 2 for the control loop in our experiments. We generate a random bit stream W of length 486, and after BCH(31,6) coding we have the watermark W c of length L W c = 2511 for embedding. The image after watermark embedding is shown in Fig. 5.8(b). There are 3268( 1.25%) pixels flipped in the image. Although the measured quality is not very good, P SNR = 19.04dB and DRD = 0.58 with m = 5, visual inspection shows that the watermarked image is still in good quality and the visual distortion is not obtrusive. Fig. 5.9 shows the pixels flipped due to the watermark embedding. The original binary image is brightened and the pixels flipped are shown as black dots. We can see that most of the pixels flipped are near the contours of the characters, and the hole-filling defects in the results

85 5.3. Chapter Summary 72 of Section do not occur here, attributing to the loop control on the quality. Thus, the embedding does not affect much the quality of the binary document image. There is no error in the decoded Ŵ after the extraction of Ŵ c, attributing to the BCH coding. There are 147( 5.85%) bits error in Ŵc. The robustness against cropping and additive noise are shown in Fig and Fig. 5.11, respectively. The cropping test is implemented as in [8], where a number of rows are cropped from the watermarked image and the cropped portion is inserted into the original image to extract the watermark. The robustness against additive noise is tested by adding Gaussian white noise of mean 0 and variances ranging from 0.01 to 0.1. The image after adding noise is a gray-level image and needs to be binarized with a mid-point threshold. The ratio of pixels changed (flipped) in the watermarked binary image before and after adding noise is shown by the dash-dot line in Fig We can see that the ECC coding, BCH(31,6), is more effective with random additive noise than with cropping, and its effectiveness (improvement) decreases as the amount of additive noise increases. 5.3 Chapter Summary First, this chapter analyzes the feasibility of watermark embedding in the DC components of DCT for binary images and gives a validation of the algorithm proposed in [71]. It has been shown that direct embedding in the DCT DC components of binary images is not feasible. Also, a mid-point threshold in binarization will remove the watermark information embedded in the DC components even if the original image is converted to a gray-level image. These two properties help to explain why the algorithm in [71] can work. The blurring

86 5.3. Chapter Summary 73 (a) Original binary image. (b) Watermarked binary image. Figure 5.8: Original and watermarked binary images for the proposed watermarking method using biased binarization.

87 5.3. Chapter Summary 74 Figure 5.9: Pixels flipped in the watermarked image of Fig. 5.8(b). Figure 5.10: Results of watermarking through biased binarization for cropping.

88 5.3. Chapter Summary 75 Figure 5.11: Results of watermarking through biased binarization for additive white Gaussian noise degradation. pre-processing transforms a binary image into a gray one, removing the first obstacle, and the introduction of a biased binarization threshold combats the second obstacle. The experimental results have validated our analysis. Based on the analysis of the DC components embedding algorithm, we propose a more efficient watermarking algorithm. The original binary image is blurred to a gray-level image to enable embedding, and the watermark bit stream is then embedded by directly biasing the binarization threshold, without DCT and IDCT. A loop is used to control the quality of the watermarked image and the robustness. A key is extracted for watermark extraction so that the original binary image is not required in the extraction. For higher extraction accuracy, error correction code is used. Experimental results show that the visual distortion in the watermarked binary image is not obtrusive and the algorithm has some robustness against cropping and additive noise.

89 Chapter 6 Watermarking in Frequency Domain This chapter studies frequency-domain approach for binary images. Unlike for multi-level images, watermarking in frequency domain is extremely hard for binary images, due to the binary nature that there are only two levels available and the strong interference process of binarization, which is necessary after the modification of the frequency domain characteristics. Through deeper understanding of the effects of frequency-domain modification on binary images and the process of binarization, we succeed in developing a watermarking algorithm for binary images operating in the frequency domain. 6.1 Difficulties in Frequency-Domain Approach Although frequency-domain approach is very popular in multi-level image watermarking, among all the works on binary image watermarking in the literature, there is no successful watermarking algorithm that is operating entirely in the frequency domain (as pointed out earlier, the algorithm in [71] is in fact

90 6.1. Difficulties in Frequency-Domain Approach 77 a spatial domain approach since the DC component represents the sum of the spatial intensities only). Y. Liu et al. [8] have made an attempt to watermark binary images in the transform-domain using the discrete cosine transform (DCT) watermarking algorithm proposed by I. Cox et al. [30]. They have shown by experiments that by cleaning the background, i.e. setting all pixels with an intensity below a threshold to white, the strength of the watermark is attenuated. They show further that binarizing such a watermarked image, which goes one step further by setting all pixels with an intensity value above the threshold to black, destroys the watermark embedded in a text image completely without any other processing or attack for a wide range of threshold values chosen. The DCT (frequency domain) characteristics of binary images are quite different from those of gray-level natural images. For gray-level images, most of the energy in DCT domain is concentrated at the lower AC components [30] since mostly the variation of intensities is gradual. However, for binary images, the energy distribution in DCT domain is more random due to the binary nature. The DCT transform of a binary image does not exhibit the energy concentration as it does for gray-level images. This explains partly why the algorithms developed for gray-level images in frequency domain are not applicable to binary images. To illustrate this point, we show the DCT domain characteristics of a graylevel (8-bit) photo image lena and a binary text image in Fig. 6.1 (a) and Fig. 6.2 (a), respectively. The image size is for both. In order to show their DCT-domain characteristics clearly, one block is cropped from each of the two images, as shown in Fig. 6.1 (b) and Fig. 6.2 (b). They are transformed into the DCT domain and the DCT spectrums are shown as images in Fig. 6.1 (c) and Fig. 6.2 (c). The magnitudes of the AC components are shown as intensity levels (black for 0 and white for maximum value) and

6.2. Important Considerations on the Frequency-Domain Approach 78 (a) Lena. (b) A 32 32 portion (zoomed in). (c) DCT of the 32 32 portion. Figure 6.

91 6.2. Important Considerations on the Frequency-Domain Approach 78 (a) Lena. (b) A portion (zoomed in). (c) DCT of the portion. Figure 6.1: AC coefficient magnitudes of DCT for a gray-level image. the DC components are set to 0 for better viewing. It could be observed from the two images that the DCT spectrum for the binary image is more random than that for the gray-level image. 6.2 Important Considerations on the Frequency- Domain Approach When we began our study of the frequency-domain approach for binary image watermarking, we reached the same conclusion that watermarking algorithms

6.2. Important Considerations on the Frequency-Domain Approach 79 (a) Binary text image. (b) A 32 32 portion (zoomed in). (c) DCT of the 32 32 portion. Figure 6.

92 6.2. Important Considerations on the Frequency-Domain Approach 79 (a) Binary text image. (b) A portion (zoomed in). (c) DCT of the portion. Figure 6.2: AC coefficient magnitudes of DCT for a binary image. developed for gray-level/color images are not suitable for binary images. When we tried to apply the same algorithms in [74] and [30] to binary images, we found that watermarks embedded are easily removed by the binarization process. However, as we continued our investigation, we have successfully developed a watermarking algorithm in the frequency domain for binary images. We noticed that the binary image watermarked using the watermarking techniques for gray-level/color images without binarization, which was still a grayscale image, has good imperceptibility. Therefore, we thought that if we embed much stronger information into the image, we would get a severely distorted grayscale image. However, what we need is a watermarked binary image

93 6.2. Important Considerations on the Frequency-Domain Approach 80 rather than a grayscale image. Hence, the distorted grayscale image can be binarized and a much less distorted binary image can be obtained. Several considerations are crucial to the success of our frequency-domain approach Block Size in Processing In our study of the DCT-domain approach, we have tried various block sizes ranging from 4 to 64 in the block processing. Due to the irregularity of the DCT domain characteristics of binary images, modification in the AC coefficients of DCT is more likely to result in obtrusive visual distortion in the spatial domain, compared with the visual distortion resulted from popular spatial-domain approaches. Therefore, we decide to choose a smaller block size in the block processing so that the visual distortion can be constrained in a smaller area. Based on the experimental study, we choose the processing block size to be 4 4, rather than the conventional 8 8. Smaller block size in processing will limit the visual distortion introduced by watermark embedding to a smaller area so that it is less obtrusive Observation on Non-Embeddable Blocks It is observed that unlike in gray-level images, AC embedding for binary image blocks cannot be always successful and we can hardly embed any information through the modification of DCT AC coefficients in some blocks, no matter how strong the watermark signal is. These blocks are called the non-embeddable blocks in our study. A study on these non-embeddable blocks reveals that they have a common characteristics, i.e., most of the AC coefficients of their DCT are zeros. Uniform blocks (all black/white) are the extreme cases in which all of the

94 6.2. Important Considerations on the Frequency-Domain Approach 81 AC coefficients are zeros. Therefore, we should avoid embedding information in these blocks. For the convenience of discussion, we call all blocks with less than 8 nonzero AC coefficients the zero-ac-dominant blocks, and we call all blocks with at least 8 nonzero AC coefficients the nonzero-ac-dominant blocks. In our algorithm, we skip all zero-ac-dominant blocks to avoid unsuccessful embedding The Embedding Strength To survive the binarization process necessary for a frequency-domain approach, the strength of embedding, which is the scaling factor α (α > 0) in our algorithm, has to be significantly larger than the value chosen in watermark embedding for gray-level/color images, e.g. in [30], α is chosen to be 0.1 while we prefer 2 α 3 in our algorithm. Otherwise, the embedded information could be completely erased by binarization, as verified by the author and in other literatures [8, 20] Exhaustive Study on Blocks of 4 4 Since we have chosen a processing block size of 4 4, it is possible to do exhaustive study on all possible blocks for parameter selection and further enhancement. There are (2 4 4 = 2 16 ) possible such blocks, among which one is all black ( 0 ) and one is all white ( 1 ). Therefore, we will study the non-uniform blocks. We define the following fields to be studied: α: the scaling factor (chosen experimentally). V Num : the number of blocks (out of 65534) that can be embedded (different from the original after watermark embedding).

95 6.3. A Simple Watermarking Algorithm in the DCT Domain 82 V Ratio : the percentage of blocks that can be embedded, i.e. V Num / F Max : the maximum number of flipped pixels in a block. F Ave : the average number of flipped pixels of all blocks. DRD Max : the maximum DRD value for m = 3 in a block. We use m = 3 rather than 5 or even larger m because the block itself is small (only 4 4). DRD Ave : the average DRD value (m = 3) of all blocks. ER: the error rate in extraction, i.e. the percentage of blocks from which the embedded bit cannot be extracted. 6.3 A Simple Watermarking Algorithm in the DCT Domain We developed a simple watermarking algorithm in the DCT domain first based on the algorithms presented in [30, 74]. The processes of watermark embedding and extraction are shown in Fig The original binary image f(x, y) is watermarked in the DCT domain to get the watermarked binary image g(x, y). The watermark W is a bitstream of 0 s and 1 s. The watermark is embedded in the most significant DCT coefficients [30] with significantly large embedding strength (the scaling factor α) so that it can survive the binarization process. The watermark embedding process is operated on 4 4 blocks and the zero- AC-dominant blocks are skipped. The binarization after embedding uses the midpoint threshold. In the following, we will discuss the algorithm in details.

96 6.3. A Simple Watermarking Algorithm in the DCT Domain 83 (a) Watermark embedding. (b) Watermark extraction. Figure 6.3: The proposed DCT-domain watermarking algorithm for binary images Watermark Embedding The steps below show the procedure in watermark embedding: 1. The original binary image f(x, y) of size M N is split into non-overlapped blocks of 4 4 pixels. 2. A random bitstream of 0 s and 1 s is generated as the watermark signal W of length L W to be embedded. This signal is then coded using ECC [52] as W c of length L W c N E, where N E is the number of nonzero-ac-dominant blocks in f(x, y). 3. All zero-ac-dominant blocks (identified before or during watermark embedding process) are skipped and blockwise DCT is performed on all nonzero-ac-dominant blocks f i.

97 6.3. A Simple Watermarking Algorithm in the DCT Domain For W c (i) = 0, 0 i < L W c, keep the block unchanged, i.e. G i = F i, where F i denotes the DCT of the block f i and G i denotes the DCT after modification. For W c (i) = 1, select the largest (in absolute value) AC coefficient of F i for embedding of the i th watermark element W c (i). Specifically, G i (u L, v L ) = F i (u L, v L ) (1 + α) (6.1) where F i (u L, v L ) and G i (u L, v L ) are the selected and modified DCT coefficients, respectively, and α is the scaling factor. 5. We then do an inverse DCT (IDCT) of G i to get a gray-level block g g i. This block is then binarized using a mid-point (the average of the maximum and minimum intensities) threshold T mid i to obtain a binary image block g i. The watermarked binary image g(x, y) is obtained by using g i to replace corresponding blocks f i in f(x, y). In practice, the DCT and IDCT of the nonzero-ac-dominant blocks are only done for W c (i) = Watermark Extraction In watermark extraction, the test binary image g(x, y) and the original binary image f(x, y) are each split into non-overlapped blocks of 4 4. Blockwise DCT is performed for all nonzero-ac-dominant blocks in f(x, y) and the corresponding blocks in g(x, y). The watermark signal is extracted from these blocks as Ŵc. W c ˆ(i) = 1 if G i(u L,v L ) F i (u L,v L ) > 1 0 otherwise.

98 6.3. A Simple Watermarking Algorithm in the DCT Domain 85 where F i (u L, v L ) and G i (u L, v L ) are the absolutely largest DCT coefficient of the block in f(x, y) and the corresponding DCT coefficient of the corresponding block in g(x, y), respectively. ECC decoding is then performed to get Ŵ from Ŵ c The Exhaustive Study Results The exhaustive study results for the simple AC embedding algorithm presented above are shown in Table 6.1. Table 6.1: Exhaustive study results for the proposed simple AC embedding algorithm in DCT-domain α V Num V Ratio F Max F Ave DRD Max DRD Ave ER % % % % % % % % % % % From the table, we can see that the number of blocks that can be embedded successfully is very small for α around 1, which is already about 10 times of the value used in [30]. As we increase α, more and more blocks are embeddable, while more pixels are flipped at the same time and the level of distortion in-

99 6.4. Enhancement of the AC Embedding Algorithm 86 creases. On the other hand, as long as the embedding can be done (the resulted image block is different from the original block), the embedded information bit can be extracted successfully. 6.4 Enhancement of the AC Embedding Algorithm Two techniques are developed to enhance the simple AC embedding algorithm so that a better balance between the success rate (V Ratio ) and the image quality (F Ave or DRD Ave ) can be achieved AC Coefficient Modification Low AC coefficients represent the gradual changes in the spatial domain and high AC coefficients indicate the degree of variations in the intensities. From the results obtained for the simple AC embedding algorithm, poor quality watermarked images are characterized by visually perceivable random-noise-like distortion. Therefore, to improve the quality, we increase the low AC coefficients slightly to encourage gradual change in spatial intensities and decrease the high AC coefficients slightly to discourage obtrusive changes (noisy distortion). This also tends to increase the chance of successful embedding since gradual changes produce more discrete (gray) levels in the spatial domain. The more discrete levels are available, the more likely the information embedded can survive binarization since one of the major reasons that the binarization process tends to erase the watermark embedded is that when there are only a few levels available, it is hard to choose a binarization threshold that can produce a modified image block different from the original one, which is what happens

100 6.4. Enhancement of the AC Embedding Algorithm 87 to those non-embeddable blocks. This enhancement procedure is applied just after step 4 of the watermark embedding process described in Section and it is implemented as follows: 1. We label the 4 4 DCT coefficients in a zigzag order as shown in Fig Coefficients 1 to 5 are classified as the low AC band and coefficients 10 to 15 are classified as the high AC band. 2. Among the low AC band, choose the lowest two nonzero (in the order of 1, 2, 3, 4 and 5) coefficients G i (u LAC1, v LAC1 ) and G i (u LAC2, v LAC2 ) if they exist. G i (u L, v L ), the coefficient modified in (6.1), is skipped in the selection if it happens to be one of the lowest two. Modify these two coefficients as follows: G i (u LAC1, v LAC1 ) = G i (u LAC1, v LAC1 ) (1 + β) (6.2) G i (u LAC2, v LAC2 ) = G i (u LAC2, v LAC2 ) (1 + β) (6.3) where β is an enhancement factor. If all low AC band coefficients (except G i (u L, v L ) if it is in the low AC band) are zeros, set the lowest two (except G i (u L, v L ) if it is one of the lowest two) to a small value, e.g in our experiments. 3. Among the high AC band, choose the highest two nonzero (in order of 15, 14, 13,...) coefficients G i (u HAC1, v HAC1 ) and G i (u HAC2, v HAC2 ) if they exist, and skip G i (u L, v L ) if it happens to be one of the highest two. Modify these two coefficients as follows: G i (u HAC1, v HAC1 ) = G i (u HAC1, v HAC1 )/(1 + β) (6.4) G i (u HAC2, v HAC2 ) = G i (u HAC2, v HAC2 )/(1 + β) (6.5)

101 6.4. Enhancement of the AC Embedding Algorithm 88 Figure 6.4: The classification of the AC coefficients of DCT. The enhancement factor β should be much smaller than the scaling factor α used in the modification of the dominant coefficient. We choose β = α/10 in our experiments Biased Binarization according to Spatial and Frequency Domain Characteristics It is observed that because of the strengthening of the low AC content, which encourages gradual change, those blocks with dominant black or white pixels tend to have more severe distortion. Therefore, in step 5 of Section 6.3.1, instead of using a mid-point threshold, we bias the binarization threshold for these blocks to reduce distortion in the following manner: 1. Record the number of white pixels ( 1 s) in the original block f i of f(x, y) as S 1s i. 2. Set P L i = u L i + v L i, where (u L i, v L i ) is the position of the dominant coefficient in step 4 of Section Thus, P L i indicates the strongest frequency characteristic of the image block. 3. The binarization threshold is biased to get T bias i according to the follow-

102 6.5. Experimental Results 89 ing: T bias i = T mid i (1 + γ P L i ) T mid i (1 γ P L i ) for S 1s i T B for S 1s i T W (6.6) T mid i otherwise where γ is a small biasing factor, and T B and T W are the thresholds for black ( 0 ) dominant blocks and white ( 1 ) dominant blocks, respectively. Thus, the biasing is smaller if low AC coefficient is dominant (smaller P L i ) and it is larger if high AC coefficient is dominant (larger P L i ). In our experiments, we choose γ = 0.05, T B = 6 and T W = Performance Improvement in Exhaustive Study Results The exhaustive study results of the enhanced algorithm are shown in Table 6.2. To illustrate the improvement, we plot V Ratio, F Max, F Ave, DRD Max, and DRD Ave against the scaling factor α for the simple AC embedding algorithm and the enhanced one in Figs. 6.5, 6.6, 6.7, 6.8 and 6.9, respectively. Through the comparison of the tables and plots, we observe that although there is not much differences in the ratio of embeddable blocks, the enhanced version has an obvious reduction in the visual distortion due to embedding. 6.5 Experimental Results Here we show the experimental results of the enhanced watermarking algorithm in the DCT domain. The original image used is the text image shown in Fig. 6.10(a). It is of size and there are 5247( 32%) nonzero-ac-dominant

103 6.5. Experimental Results 90 Table 6.2: Exhaustive study results for the enhanced DCT-domain embedding algorithm α V Num V Ratio F Max F Ave DRD Max DRD Ave ER % % % % % % % % % % % Ratio of embeddable blocks Simple AC Embedding Enhanced AC Embedding Scaling factor α Figure 6.5: Comparison of V Ratio before and after enhancement.

104 6.5. Experimental Results 91 Maximum number of flipped pixels Simple AC Embedding Enhanced AC Embedding Scaling factor α Figure 6.6: Comparison of F Max before and after enhancement. 4 Average number of flipped pixels Simple AC Embedding Enhanced AC Embedding Scaling factor α Figure 6.7: Comparison of F Ave before and after enhancement.

105 6.5. Experimental Results 92 6 Maximum DRD value (m=3) Simple AC Embedding Enhanced AC Embedding Scaling factor α Figure 6.8: Comparison of DRD Max before and after enhancement. 3 Average DRD value (m=3) Simple AC Embedding Enhanced AC Embedding Scaling factor α Figure 6.9: Comparison of DRD Ave before and after enhancement.

106 6.5. Experimental Results 93 blocks out of blocks. As in Chapter 5, we employ ECC to reduce the error rate in extraction. Since for the proposed AC component embedding algorithm, the capacity (5247) is much larger than that in Chapter 5 (2512), to embed approximately the same information, we employ a code with higher coding rate, BCH(63,7). The data before BCH coding is 581 bits and that after the encoding is 5229 bits. The information embedded is more than that in Section The image after the watermark embedding is shown in Fig. 6.10(b). The number of changed pixels is 3041( 1.16%). The PSNR for the watermarked image is 19.36dB and the DRD value for m = 5 is Some flipped pixels are obtrusive from visual inspection, due to the difficulty in frequency-domain embedding. Fig shows the pixels flipped due to the watermark embedding. The original binary image is brightened and the pixels flipped are shown as black dots. Most of the flipped pixels are near the contours of the characters. Thus, the embedding does not affect the quality of the document image much. There is no error in the decoded Ŵ after the extraction of Ŵ c, attributing to the BCH coding. There are 586( 11.2%) bits error in Ŵc. The robustness against cropping and additive noise is shown in Fig and Fig. 6.13, respectively. The cropping test is also implemented as in [8], where a number of rows are cropped from the watermarked image and the cropped portion is inserted into the original image to extract the watermark. The robustness against additive noise is tested by adding Gaussian white noise of mean 0 and variances ranging from 0.01 to 0.1. As in Chapter 5, the image after adding noise is gray and needs to be binarized using a mid-point threshold. The ratio of pixels changed in the watermarked binary image before and after adding noise is shown by the dash-dot line in Fig We can see that the ECC coding, BCH(63,7), is more effective on random additive noise than on

107 6.5. Experimental Results 94 (a) Original binary image. (b) Watermarked binary image. Figure 6.10: Original and watermarked binary images for the proposed DCTdomain watermarking algorithm.

6.6. Chapter Summary 95 Figure 6.11: Flipped pixels in the watermarked image in Fig. 6.10(b). cropping, and its effectiveness (improvement) decreases as the amount of noise increases.

108 6.6. Chapter Summary 95 Figure 6.11: Flipped pixels in the watermarked image in Fig. 6.10(b). cropping, and its effectiveness (improvement) decreases as the amount of noise increases. Compared with the results in Section 5.2.4, this algorithm is more robust against additive noise. 6.6 Chapter Summary We studied the frequency-domain approach for binary image watermarking in this chapter. The watermark is embedded with significantly large strength to the absolutely largest AC coefficients of DCT of the embeddable (nonzero-acdominant) 4 4 blocks. To improve the performance (to reduce the distortion in particular), several other AC coefficients are modified as well and the binarization is biased according to the spatial and frequency domain characteristics. The algorithm proposed is a non-blind one requiring the presence of the original image in the watermark extraction. It shows some robustness and the distor-

Data Hiding in Binary Text Documents 1. Q. Mei, E. K. Wong, and N. Memon

Data Hiding in Binary Text Documents 1 Q. Mei, E. K. Wong, and N. Memon Department of Computer and Information Science Polytechnic University 5 Metrotech Center, Brooklyn, NY 11201 ABSTRACT With the proliferation