Robust Lossless Data Hiding. Outline

Robust Lossless Data Hiding Yun Q. Shi, Zhicheng Ni, Nirwan Ansari Electrical and Computer Engineering New Jersey Institute of Technology October 2010 1 Outline What is lossless data hiding Existing robust lossless data hiding techniques Our developed new technology Bit embedding strategy Error correction code Chaotic mixing Data embedding block diagram Data extraction block diagram Experimental results 2 1

What is lossless data hiding? Data hiding Most data hiding techniques are lossy Spread spectrum (over/underflow, round-off error) LSB (bit-replacement without memory) QIM (quantization error) Original Lena image Image after data hiding 3 Lossless data hiding is such a data hiding technique in which the original cover media can be recovered without any distortion after data extraction. Also referred to as: reversible, distortion-free, invertible, distortionless. It is often used in special applications, such as military image and medical image, where the original image is of importance. Most of them are fragile, which means the hidden data cannot be recovered when compression or other small alteration is applied to the marked image. 4 2

Only Existing Robust Lossless Data Hiding Technique: Vleeschouwer et al. s It has some robustness against high quality JPEG compression, i.e., hidden data can be retrieved correctly. Main idea: 1. Each group is randomly divided into two sets of pixels of equal size, i.e., zones A and B. 5 2. The histogram is mapped onto a circle: positions luminance, weight number of pixels corresponding to the luminance. Number of pixel number 7 6 5 4 3 0 1 2 3 4 5 6 7.....Q-2 Q-1 Pixel value Q-2 Q-1 0 1 2 6 3

3. Vectors C a and C b point from the center of the circle to the mass center of zones A and B, respectively. Note: modulo 256 addition is used in bit embedding to prevent over/underflow. 7 Robustness test: 100 bits are embedded. 8 4

Drawbacks: 1. Salt-and-pepper noise: Since modulo 256 addition is used, a very bright pixel with a large gray value close to 255 will be possibly changed to a very dark pixel with a small gray value close to 0, and vice versa. 2. Low PSNR See figures and numbers below. 9 Marked medical image (severe salt-pepper noise) 10 5

Marked medical image (severe salt-pepper noise) 11 Marked medical image (severe salt-pepper noise) 12 6

Marked medical image (severe salt-pepper noise) 13 Marked medical image (severe salt-pepper noise) 14 7

Marked medical image (some salt-pepper noise). 15 Marked medical image (some salt-pepper noise). 16 8

Marked medical image (some salt-pepper noise). 17 Marked JPEG2000 test image (severe salt-pepper noise). 18 9

Marked JPEG2000 test image (severe salt-pepper noise). 19 Marked JPEG2000 test image (severe salt-pepper noise). 20 10

Marked JPEG2000 test image (severe salt-pepper noise). 21 Marked JPEG2000 image (severe salt-pepper noise). 22 11

Marked JPEG2000 image (some salt-pepper noise). 23 Marked JPEG2000 image (some salt-pepper noise). 24 12

Marked JPEG2000 image (some salt-pepper noise). 25 2. Low PSNR. Images (512x512) PSNR of marked image (db) Data embedding capacity (bits) Robustness (bpp) Mpic1 9.28 476 1.0 Mpic2 4.73 476 2.0 Mpic3 26.38 476 0.8 Mpic4 26.49 476 0.6 Mpic5 26.49 476 0.6 Mpic6 5.60 476 1.6 Mpic7 9.64 476 0.8 Mpic8 5.93 476 2.8 26 13

Images (1536x1920) PSNR of marked image (db) Data embedding capacity (bits) Robustness (bpp) N1A 17.73 1410 0.8 N2A 17.73 1410 2.2 N3A 23.73 1410 0.6 N4A 19.67 1410 1.2 N5A 17.28 1410 1.2 N6A 23.99 805 0.6 N7A 20.66 1410 1.4 N8A 14.32 805 1.4 27 Our Novel Robust Lossless Data Hiding Technique No salt-and-pepper noise at all. Applicable to commonly used image, medical image, more than 1000 images in CorelDRAW database, and JPEG2000 test images. Average PSNR of marked images is above 39 db. Robust to JPEG2000 or JPEG compression to a certain extent Data embedding capacity is adjustable, can provide 512 to 1024 bits (satisfactory for authentication purpose). 28 14

Main Idea Block statistic moment is used as a robust parameter to embed information. For a given image block, we split it into two sub-sets A and B as shown below. + - + - + - + - - + - + - + - + + - + - + - + - - + - + - + - + + - + - + - + - - + - + - + - + + - + - + - + - - + - + - + - + 29 The difference value is defined as the arithmetic average of differences of pixel pairs, α n 1 α = ( a i b i ) n i= 1 It shows that most values of α are very close to zero. Following figure, Boat image, block size 12 by 12. 30 15

Since the difference value α is based on the statistics of all pixels in the block, it has certain robustness against compression attacks. Hence the difference value α is used as a robust quantity to embed information bit. 31 Bit Embedding Strategy The main idea for bit embedding is that the difference value is kept within a specified threshold K and K (usually K is less than 5) to embed bit 0 and the difference value is shifted beyond the threshold K or K to embed bit 1. In order to avoid salt-and-pepper noise, modulo 256 addition technique is not used. In order to overcome over/underflow problem, we classify the blocks into four categories and use different bit embedding schemes for each category. 32 16

Notes 1. Shift quantity β is usually twice the specified threshold K. 2. Shifting towards right side means adding a fixed shift β quantity to each pixel value marked by + of subset A, and vice versa for shifting towards left side. 3. Pixel values marked by - of subset B are kept intact. 33 Category 1: The pixel values of a block under consideration are far enough away from two bounds of histogram 0 β d 255-255-β 255 l d r 34 17

Case 1. The difference value α is located between the threshold K and -K. 1. If to be embedded bit is 1, we shift the difference valueα by a quantity β towards the right hand side or left hand side depending on if α is positive or negative. 2. If to be embedded bit is 0, the pixel value of that block is intact. or α -K value shifted towards left to embed 1 0 K Original α value α value shifted towards right to embed 1 Embedding a bit 1. 35 Case 2. The absolute value of α value exceeds the threshold K. In order to keep the lossless data hiding principle, no matter whether to be embedded bit is 0 or 1, it always embeds bit 1 by shifting the difference value α by a quantity β, thus further leaving away from the zero point. In this way, it introduces some error bits. -K 0 K or value shifted towards left to embed 1 Original value Original value value shifted towards right to embed 1 36 18

Category 2: The pixel values of the block under consideration are close to the low bound of the histogram. 0 255-255 β β 37 Case 1. The difference value α is located between the threshold K and -K. 1. If to be embedded bit is 1, we always shift the difference value by a quantity β towards the right-hand side beyond the threshold K. 2. If to be embedded bit is 0, the pixel value of that block is intact. -K Original value 0 K value shifted towards right to embed 1 38 19

Case 2. The value α is located on the right-hand side beyond threshold K. No matter whether to be embedded bit is 0 or 1, it always embeds bit 1 by shifting the difference value by a quantity further leaving away from the zero point, as shown in the following figure. In this way, it introduces some error bits, which will be corrected by using ECC. -K 0 K Original value value shifted towards right to embed 1 39 Case 3. The value α is located on the left-hand side beyond threshold -K. No matter whether to be embedded bit is 0 or 1, neither bit 0 nor bit 1 is embedded in this block, which means this block is skipped in data embedding. The coordinates of this block are recorded as side information. Original value -K 0 K neither bit 0 nor bit 1 is embedded. 40 20

Category 3: The pixel values of the block under consideration are on the upper bound of the histogram β 0 255-255 β Category 3 is similar to category 2 except the block is close to the upper bound instead of the low bound of the histogram. Hence data embedding algorithm of Category 3 is similar to that of Category 2 except shifting difference value α by a quantity β to the left-hand side instead of to the right-hand 41 side. Category 4: Some pixel values of the block under consideration are on the upper bounds, while some pixel values are on the low bounds of the histogram 0 β 255-β 255 42 21

Case 1. The difference value α is located between the threshold K and -K. No matter whether to be embedded bit is 0 or 1, it always embeds bit 0 by keeping the difference value intact. In this way, it introduces some error bits. -K Original value 0 K 43 Case 2. The absolute value α is beyond the threshold K. No matter whether to be embedded bit is 0 or 1, neither bit 0 nor bit 1 is embedded in this block, which means this block is skipped in data embedding. The coordinates of this block are recorded as side information. -K Original value 0 K Original value 44 22

The above-mentioned four categories cover all situations that a block may encounter in data embedding. The detailed description of the bit embedding method clearly shows that the modified pixel value is still in the range of [0, 255] and hence no over/underflow will take place. The following table provides statistics, which indicates that the situation in which error bits may be produced or book-keeping data is needed is rather rare. 45 Table 3. Category and case occurrence of some test images. Category 1 Category 2 Category 3 Category 4 Case 1 Case 2 Case 1 Case 2 Case 3 Case 1 Case 2 Case 3 Case 1 Case 2 Lena 1024 0 0 0 0 0 0 0 0 0 Baboon 999 3 22 0 0 0 0 0 0 0 Boat 987 0 27 0 0 8 0 0 2 0 Mpic1 536 0 319 1 0 147 0 0 19 2 Mpic2 141 0 646 0 0 146 0 0 88 3 Mpic3 1012 0 0 0 0 0 0 0 12 0 Mpic4 1012 0 0 0 0 1 0 0 11 0 Mpic5 1012 0 0 0 0 1 0 0 11 0 Mpic6 242 0 646 0 0 82 0 0 52 2 Mpic7 517 0 451 0 0 44 0 0 11 1 Mpic8 366 0 593 0 0 57 0 0 8 0 N1A 3313 0 927 0 0 794 0 0 86 0 N2A 1479 0 1404 1 2 847 0 0 1382 5 N3A 4143 1 408 0 0 475 0 1 92 0 N4A 3151 0 160 0 0 1546 0 0 263 0 N5A 1542 1 114 0 0 3027 8 11 405 12 N6A 4273 0 79 0 0 707 0 0 61 0 N7A 3089 3 1009 0 0 169 1 1 839 9 N8A 822 0 2178 1 1 265 0 1 1841 11 46 23

Error Correction Code (ECC) As mentioned above, in the bit embedding process, it may introduce a few error bits. Error correction code (ECC) is utilized to correct these errors. In this technique, BCH (15,11,1), BCH (15,7,2), BCH (15,5,3), BCH (31,6,7) and BCH (63,7,15) are used for different situations. 47 Chaotic Mixing In some special images, error bits are concentrated in some small areas of the image, which leads to many error bits in one codeword, thus causing error in data extraction. To combat this type of burst errors, chaotic mixing technique is utilized. Algorithm: 2 1 r = A r, where A = 1 1 48 24

Chaotic mixing of Baboon image by A n, a) original image, b) n=1, c) n=5, d) n=10, where n is chaotic mixing number 49 Data Embedding Block Diagram Original image Block split Bit embedding Marked image Watermark signal Error correction coding Chaotic mixing Side information 50 25

Bit Extraction Strategy Data extraction is actually the reverse process of data embedding. 1. If the absolute difference value α is beyond the threshold K, then bit 1 is extracted and the difference value is shifted back towards the zero point by adding or subtracting a quantity β. In this way, the difference value is back to its original value, which means each pixel value of subset A is back to its original value. value of marked block -K Original position 0 K Original position value of marked block 51 2. If the absolute value of the difference value α is within the threshold K, then bit 0 is extracted and nothing to do on the pixel value of that block. 3. After data extraction, inverse chaotic mixing and ECC decoding are applied, respectively, to obtain the correct originally embedded bit sequence. Marked image Bit Extraction Inverse chaotic mixing Decoding Extracted mark Side information Reconstructed image 52 26

Experimental Results This novel algorithm has been applied to commonly used grayscales images such as Lena, Baboon, etc. eight medical images more than 1000 images in the CoralDraw database eight JPEG2000 color test images. It has demonstrated that our algorithm can be applied to all of these test images successfully. 53 Medical image (a) original (b) marked 54 27

Medical image (a) original (b) Marked marked image 55 Medical image (a) original (b) Marked markedimage 56 28

Medical image (a) original (b) marked 57 Medical image (a) original (b) marked 58 29

Medical image (a) original (b) marked 59 JPEG2000 test image (a) original (b) marked 60 30

JPEG2000 test image (a) original (b) marked 61 Lena (a) original (b) marked 62 31

Baboon (a) original (b) marked 63 CoralDraw image (a) original (b) marked 64 32

Test results for more than 1000 images in CorelDraw database Images (512 768) PSNR of marked image (db) Data embedding capacity (bits) Robustness (bpp) Max Min Avg. 714 Max Min Avg. 45.2 37.4 40.2 2.0 0.2 1.21 65 Test results for eight medical images. Images (512x512) PSNR of marked image (db) Data embedding capacity (bits) Robustness (bpp) Mpic1 40.4 768 0.8 Mpic2 40.8 560 0.8 Mpic3 40.3 792 0.6 Mpic4 40.3 792 1.0 Mpic5 40.3 792 0.8 Mpic6 40.7 560 0.8 Mpic7 40.4 768 0.4 Mpic8 40.6 560 0.8 66 33

Test results for eight JPEG2000 test images. Images (1536x1920) PSNR of marked image (db) Data embedding capacity (bits) Robustness (bpp) N1A 45.1 1398 0.8 N2A 43.1 1398 1.6 N3A 45.1 1398 1.0 N4A 45.2 1398 1.0 N5A 45.5 1200 1.0 N6A 45.0 1267 0.4 N7A 40.6 1398 1.2 N8A 41.5 798 1.4 67 Conclusion The developed new robust lossless image data hiding technique has the following advantages. 1. No annoying salt-and-pepper noise at all; 2. Applicable to virtually all images (including commonly used images, medical images, more than 1000 images in CorelDRAW database, and JPEG2000 test images); 3. Average PSNR of marked images is above 39 db; 4. Robust to JPEG/JPEG2000 compression to a certain extent; 5. Data embedding capacity ranges from 512 bits to 1024 bits (often sufficient for authentication purpose), and the embedding capacity can be adjusted according to the requirement. 68 34

Application Scenarios Authentication Medical data system On-line identity verification Others 69 Application 1: Authentication of JPEG2000 images Traditional authentication schemes fail in some JPEG2000 application scenarios, say, compression with different implementations, different transcoding schemes, multiple compression cycles may introduce incidental alterations. Authentication framework for JPEG2000 images is needed For both Integrity and Non-repudiation purposes It should include the security solutions for JPEG2000 at the content level which achieves both security and robustness. 70 35

Pixel difference between original and decoded image 71 A Unified Authentication Framework for JPEG2000 Image Having cryptographic strength Features signature is embedded into images Fragile and semi-fragile authentication Lossy and lossless compression, hence Lossy module for Semi-fragile authentication Lossless module for Semi-fragile authentication Has been included into JPEG2000 Security Part, JPSEC. 72 36

System overview of the unified authentication system 73 Application 2: Secure Medical Data System Relevant/Associate information can be hidden inside Medical Data (record/document, etc.). Who s who will not be missed up: Easy handling. Sensitive information will not be compromised even after the hidden data have been extracted if encryption is used properly. Vertically, multi-level security. Horizontally, one doctor/insurance person cannot access others data. Can be used for other applications as well. UMDNJ, PACS, authentication 74 37

Application 3: On-line Identity Verification A paper to be published in IWDW04 Sender s fingerprint image Features of fingerprint and sender information are reversibly embedded into the fingerprint image. Verification is conducted at the receiver side with accessing central database. 75 Summary (Robust) reversible data hiding has opened a new door of data hiding: Methodology Linking original media with headers 76 38

Acknowledgement New Jersey Commission of Science and Technology via NJWINS Digital Data Embedding Technologies group of the Air Force Research Laboratory, Rome Research Site, Information Directorate, Rome NY, under contract F30602-03-1-0264. 77 References 1. C. De Vleeschouwer, J. F. Delaigle, and B. Macq, Circular interpretation on histogram for reversible watermarking, IEEE International Multimedia Signal Processing Workshop, France, pp. 345-350, October 2001. 2 Z. Ni, Y. Q. Shi, N. Ansari, W. Su, Q. Sun and X. Lin, Robust lossless data hiding, IEEE International Conference on Multimedia and Expo (ICME04), Taipei, Taiwan, June 2004. 3. Z. Ni, Y. Q. Shi, N. Ansari, W. Su, Q. Sun and X. Lin, Robust lossless image data hiding designed for semi-fragile image authentication, IEEE Transactions on Circuits and Systems for Video Technology, vol.18, no. 4, pp. 497-509, April 2008. 4. Z. Zhang, Q. Sun, X. Lin, Y. Q. Shi and Z. Ni, A unified authentication framework for JPEG2000 images, IEEE International Conference and Expo (ICME04), Taipei, Taiwan, June 2004. 5. A joint proposal by Institute of Infocomm Research, Singapore and NJIT, entitled A Unified Authentication System for JPEG2000 Images, has been included into the Secure JPEG2000 (JPSEC), an ISO Standard in Apr/2007. The JPSEC (ISO/IEC 15444-8:2007) specifies the framework, concepts, and methodology for securing JPEG2000 codestreams. 78 39