Forensic analysis of JPEG image compression

Similar documents
Compression II: Images (JPEG)

Multimedia Systems Image III (Image Compression, JPEG) Mahdi Amiri April 2011 Sharif University of Technology

Lecture 8 JPEG Compression (Part 3)

CMPT 365 Multimedia Systems. Media Compression - Image

Digital Image Representation Image Compression

Video Compression An Introduction

Lecture 8 JPEG Compression (Part 3)

Features. Sequential encoding. Progressive encoding. Hierarchical encoding. Lossless encoding using a different strategy

Image Compression Algorithm and JPEG Standard

Image Compression Standard: Jpeg/Jpeg 2000

Wireless Communication

7.5 Dictionary-based Coding

INF5063: Programming heterogeneous multi-core processors. September 17, 2010

JPEG Compression. What is JPEG?

VC 12/13 T16 Video Compression

Index. 1. Motivation 2. Background 3. JPEG Compression The Discrete Cosine Transformation Quantization Coding 4. MPEG 5.

JPEG: An Image Compression System

JPEG: An Image Compression System. Nimrod Peleg update: Nov. 2003

7: Image Compression

Introduction ti to JPEG

Image, video and audio coding concepts. Roadmap. Rationale. Stefan Alfredsson. (based on material by Johan Garcia)

AUDIOVISUAL COMMUNICATION

IMAGE COMPRESSION. Image Compression. Why? Reducing transportation times Reducing file size. A two way event - compression and decompression

JPEG. Wikipedia: Felis_silvestris_silvestris.jpg, Michael Gäbler CC BY 3.0

Lecture 5: Compression I. This Week s Schedule

Image Tampering Detection Using Methods Based on JPEG Compression Artifacts: A Real-Life Experiment

Digital Image Processing

CS 335 Graphics and Multimedia. Image Compression

ECE 417 Guest Lecture Video Compression in MPEG-1/2/4. Min-Hsuan Tsai Apr 02, 2013

Multimedia Signals and Systems Still Image Compression - JPEG

Digital Video Processing

Total Variation Based Forensics for JPEG Compression

JPEG decoding using end of block markers to concurrently partition channels on a GPU. Patrick Chieppe (u ) Supervisor: Dr.

MPEG-4: Simple Profile (SP)

Robert Matthew Buckley. Nova Southeastern University. Dr. Laszlo. MCIS625 On Line. Module 2 Graphics File Format Essay

IMAGE COMPRESSION. Chapter - 5 : (Basic)

Color Imaging Seminar. Yair Moshe

IMAGE COMPRESSION USING HYBRID QUANTIZATION METHOD IN JPEG

Does everyone have an override code?

HYBRID TRANSFORMATION TECHNIQUE FOR IMAGE COMPRESSION

compression and coding ii

JPEG IMAGE CODING WITH ADAPTIVE QUANTIZATION

AN ANALYTICAL STUDY OF LOSSY COMPRESSION TECHINIQUES ON CONTINUOUS TONE GRAPHICAL IMAGES

NOVEL ALGORITHMS FOR FINDING AN OPTIMAL SCANNING PATH FOR JPEG IMAGE COMPRESSION

Video Compression MPEG-4. Market s requirements for Video compression standard

Computer Vision 2. SS 18 Dr. Benjamin Guthier Professur für Bildverarbeitung. Computer Vision 2 Dr. Benjamin Guthier

Interframe coding A video scene captured as a sequence of frames can be efficiently coded by estimating and compensating for motion between frames pri

06/12/2017. Image compression. Image compression. Image compression. Image compression. Coding redundancy: image 1 has four gray levels

Ian Snyder. December 14, 2009

Image Compression. CS 6640 School of Computing University of Utah

Lecture 6 Introduction to JPEG compression

Compression Part 2 Lossy Image Compression (JPEG) Norm Zeck

An introduction to JPEG compression using MATLAB

DigiPoints Volume 1. Student Workbook. Module 8 Digital Compression

CS101 Lecture 12: Image Compression. What You ll Learn Today

Image coding and compression

Repetition 1st lecture

Lecture Coding Theory. Source Coding. Image and Video Compression. Images: Wikipedia

Lossy compression CSCI 470: Web Science Keith Vertanen Copyright 2013

JPEG Copy Paste Forgery Detection Using BAG Optimized for Complex Images

Multimedia Communications. Transform Coding

Review and Implementation of DWT based Scalable Video Coding with Scalable Motion Coding.

IMAGE COMPRESSION. October 7, ICSY Lab, University of Kaiserslautern, Germany

From Wikipedia, the free encyclopedia

ROI Based Image Compression in Baseline JPEG

Steganography using Odd-even Based Embedding and Compensation Procedure to Restore Histogram

Image Compression Techniques

DIGITAL TELEVISION 1. DIGITAL VIDEO FUNDAMENTALS

A Image Comparative Study using DCT, Fast Fourier, Wavelet Transforms and Huffman Algorithm

A Blind Steganalysis on JPEG Gray Level Image Based on Statistical Features and its Performance Analysis

Fundamentals of Video Compression. Video Compression

Lossy compression. CSCI 470: Web Science Keith Vertanen

Steganography: Hiding Data In Plain Sight. Ryan Gibson

Video Compression Standards (II) A/Prof. Jian Zhang

The Basics of Video Compression

Interactive Progressive Encoding System For Transmission of Complex Images

Image Coding. Image Coding

2.2: Images and Graphics Digital image representation Image formats and color models JPEG, JPEG2000 Image synthesis and graphics systems

Biomedical signal and image processing (Course ) Lect. 5. Principles of signal and image coding. Classification of coding methods.

Digital Image Representation. Image Representation. Color Models

Lecture 3 Image and Video (MPEG) Coding

JPEG Compression Using MATLAB

Lecture 6: Compression II. This Week s Schedule

ISSN (ONLINE): , VOLUME-3, ISSUE-1,

Politecnico di Torino. Porto Institutional Repository

MRT based Fixed Block size Transform Coding

The DCT domain and JPEG

Stereo Image Compression

15 Data Compression 2014/9/21. Objectives After studying this chapter, the student should be able to: 15-1 LOSSLESS COMPRESSION

FPGA Implementation of 2-D DCT Architecture for JPEG Image Compression

Professor Laurence S. Dooley. School of Computing and Communications Milton Keynes, UK

Video Codec Design Developing Image and Video Compression Systems

A HYBRID DPCM-DCT AND RLE CODING FOR SATELLITE IMAGE COMPRESSION

Detection of double-compression for applications in steganography

The PackBits program on the Macintosh used a generalized RLE scheme for data compression.

BLIND MEASUREMENT OF BLOCKING ARTIFACTS IN IMAGES Zhou Wang, Alan C. Bovik, and Brian L. Evans. (

Lossless Image Compression having Compression Ratio Higher than JPEG

VIDEO AND IMAGE PROCESSING USING DSP AND PFGA. Chapter 3: Video Processing

Nuno Vasconcelos ECE 271A

Image Coding and Compression

Transcription:

Forensic analysis of JPEG image compression Visual Information Privacy and Protection (VIPP Group) Course on Multimedia Security 2015/2016

Introduction

Summary Introduction The JPEG (Joint Photographic Expert Group) standard Forensic analysis of JPEG images

What is JPEG? JPEG (Joint Photographic Expert Group) is an international standard for lossy image compression released in 1992 JPEG is still today one of the most popular image formats on the Web JPEG is used by 73.5% of all the websites Source: https://w3techs.com/technologies/overview/image_format/all (updated April 2016)

What is JPEG? JPEG is used in many applications. It is particularly suitable for the compression of photographs and paintings of realistic scenes with smooth variations of tone and color GIF JPEG GIF JPEG With respect to the also widely diffused GIF format, JPEG ensures better visual quality compressed images for the same file size

How does it work? JPEG achieves good trade-off between visual quality and compression efficiency by exploiting a number of algorithms Color Space Transform and subsampling Discrete Cosine Transform (DCT) Quantization Zig-zag ordering Differential Pulse Code Modulation (DC component) Run Length Encoding (AC components) Entropy Coding (Huffman or Arithmetic)

JPEG baseline encoding C b Y C r f(i, j) 8 x 8 DCT F(u, v) 8 x 8 Quantization Fq(u, v) 1. Discrete Cosine Transform of each 8x8 pixel block 2. Scalar quantization Main steps: Header Tables Data Coding tables Entropy Coding Quantization tables Zig Zag Scan DPCM RLC 3. Zig-zag scan to exploit redundancy 4. Differential Pulse Code Modulation (DPCM) on the DC component and Run Length Encoding of the AC components 5. Entropy coding (Huffman) Reverse order for decoding

Color space transform: RGB to YCbCr RGB color space is not the only method to represent an image There are several other color spaces, each one with its properties A popular color space in image compression is the YCbCr, which: o separates luminance (Y) from color information (Cb,Cr) o processes Y and (Cb,Cr) separately (not possible in RGB) RGB to YCbCr and YCbCr to RGB linear conversions:

Color space transform example

Color space transform subsampling Y is taken every pixel, and Cb,Cr are taken for a block of 2x2 pixels Data size is reduced to a half without significant losses in visual quality Example: block 64x64 Without subsampling, one must take 64 2 pixel values for each color channel: 3* 64 2 = 12288 values (1 bytes per value) JPEG takes 64 2 values for Y and 2x32 2 values for chroma 64 2 + 2x32 2 = 6144 values (1 bytes per value)

Inverse DCT Forward DCT Discrete Cosine Transform (DCT) Transformed data are more suitable to compression (e.g. skew probability distribution, reduced correlation). No lossy 7 7 1 (2x 1) u (2y 1) v F( u, v) C( u) C( v) f ( x, y)cos cos 4 x 0 y 0 16 16 for u 0,...,7 and v 0,...,7 where Ck ( ) 1/ 2 for k 0 1 otherwise 7 7 1 (2x 1) u (2y 1) v f ( x, y) C( u) C( v) F( u, v)cos cos 4 u 0 v 0 16 16 for x 0,...,7 and y 0,...,7

Quantization Goal: to reduce number of bits per sample For each 8x8 DCT block, F(u.v) is divided by a 8x8 quantization matrix Q Example: F = 45 = (101101) 2 (6 bits) Q(u,v), quantization step at frequency (u,v) o Truncate to 4 bits (Q=4): (1011) 2 = 11. (De-quantize: 11x4 = 44 against 45) o Truncate to 3 bits (Q=8): (101) 2 = 5. (De-quantize: 8x5 = 40 against 45) Quantization error is the main reason why JPEG compression is LOSSY

Quantization Each F[u,v] in a 8x8 block is divided by constant value Q(u,v) Higher values in the quantization matrix Q achieve better compression at the cost of visual quality How to choose Q? Eye is more sensitive to low frequencies (upper left corner of the 8x8 block) less sensitive to high frequencies (lower right corner) Idea: quantize more (large quantization step) the high frequencies, less the low frequencies The values of Q are controlled with a parameter called Quality Factor (QF) which ranges from 100 (best quality) to 1 (extremely low)

Quantization: luminance Quantization table Q for QF = 50

Quantization: chrominance Quantization table Q for QF = 50 Can quantized color more coarsely due to reduced sensitivity of the Human Visual System (HVS)

Quantization: luminance and chrominance An example of quantization table Q for QF = 70 The quantization is less strong at larger QF

NO JPEG (20MB) JPEG 100 (9MB) JPEG 60 (1.3MB) JPEG 20 (0.6MB) JPEG 5 (0.4MB)

Zig-zag ordering of quantized coefficients For further processing, each 8x8 block is converted to a 1x64 vector To do so, JPEG adopts a method called zig-zag scan, which packs together low, medium and high frequency coefficients

Zig-zag ordering of quantized coefficients Packing coefficients in a clever way Low 0 1 5 6 14 15 27 28 2 4 7 13 16 26 29 42 3 8 12 17 25 30 41 43 9 11 18 24 31 40 44 53 Medium Typically, these are very small or 0, for RLE it is good to have them packed 10 19 23 32 39 45 52 54 20 22 33 38 46 51 55 60 High 21 34 37 47 50 56 59 61 35 36 48 49 57 58 62 63 Normal (e.g. column-wise) ordering: frequencies are mixed 0 2 3 5 10 20 21 35 1 47 49 14 16 46 50 64 Zig-zag: frequencies are better sorted 0 2 3 9 10 1 4 8.. 20 19 18.. 49 50.. 60 61 63

DPCM on DC component DC component is large and can assume several different values. However, often the difference between DCs of two adjacent blocks is not large To save bits, encode the difference from DC of previous 8x8 blocks o This procedure is called Differential Pulse Code Modulation (DPCM)

DPCM on DC component DPCM: Laplacian distribution sharply peaked in 0 (right) Pixels: uniform distribution in [0,255] (left) The entropy of the error image is much smaller than that of the original image

RLE on AC components The 1x64 vectors have a lot of zeros, more so towards the end of the vector. o Higher up entries in the vector capture higher frequency (DCT) components which tend to be capture less of the content. Encode a series of 0s as a (skip,value) pair, where skip is the number of zeros and value is the next non-zero component. o Send (0,0) as end-of-block sentinel value.

Entropy coding DC components are differentially coded as (SIZE,Value) o The code for a Value is derived from the following table SIZE Value Code 0 0 --- 1-1,1 0,1 2-3, -2, 2,3 00,01,10,11 3-7,, -4, 4,, 7 000,, 011, 100, 111 4-15,, -8, 8,, 15 0000,, 0111, 1000,, 1111.... 11-2047,, -1024, 1024, 2047

Entropy coding DC components are differentially coded as (SIZE,Value). The code for a SIZE is derived from the following table SIZE Length Code 0 2 00 1 3 010 2 3 011 3 3 100 4 3 101 5 3 110 6 4 1110 7 5 11110 8 6 111110 9 7 1111110 Example: If a DC component is 40 and the previous DC component is 48. The difference is -8. Therefore it is coded as: 1010111 0111: The value for representing 8 (see Size_and_Value table) 101: The size from the same table reads 4. The corresponding code from the table at left is 101. 10 8 11111110 11 9 111111110

Forensic Analysis of JPEG images

JPEG compression footprints Like any other image processing, JPEG leaves traces into the image, especially at low quality factors o Such traces can be exploited to gather useful information on the image Some JPEG artifacts are immediately identified o Blocking due to block discontinuities o Ringing on edges due to the DCT o Graininess due to coarse quantization o Blurring due to high frequency removal Other (statistical) alterations are way more subtle to identify!

Blocking artifacts Processing each 8x8 blocks independently introduces discontinuities along the block boundaries, thus making image tiling visible

Ringing No ringing Ringing artifacts Spurious signals near sharp transitions o Visually, they appear as bands or ghosts o Particularly evident along edges an in text images

Graininess artifacts Particularly evident as dots along the edges

Blurring artifacts Removing high frequency DCT coefficients increases the smoothness of the image, retaining shapes but making textures less distinguishable o Human eye is particularly good at spotting smoothness

Double JPEG compression: footprints Double JPEG compression is when an image is JPEG compressed first with QF 1 and then JPEG compressed again with QF 2 Statistical footprints, due to double quantization In MM-Forensics, several approaches have been proposed to reveal the footprints (periodic artifacts) left by double compression

Statistical footprints: double quantization Why understanding whether an image has been JPEG compressed (quantized) twice is important?

Suppose you took this nice picture with your camera. Image that this picture did not undergo any compression (a TIF image, for example)

Download an image from the Internet. It is very likely that this one is a.jpg file that is JPEG compressed with a certain QF Start your favorite image editing software.

Create a fake, realistic and deceptive image. Save your effort as JPEG

Create a fake, realistic and deceptive image. Save your effort as JPEG How can one reveal your manipulation?

By observing that This region has been quantized twice (in the image you download and when you save the fake) All the rest is quantized once (when you saved the fake)

Single quantization (SQ) Quantization is the point-wise operation: Where: o is a strictly positive integer called quantization step o The value is approximated to the largest previous integer De-quantization brings the quantized values back to their original range Qa is not invertible because of the truncation operation

Double quantization (DQ) Double quantization is again a point-wise operation: Where: o and are the quantization steps of the first and second quantization Double quantization can be represented as a sequence of three steps 1. Quantization with step 2. De-quantization with step 3. Quantization with step

Double quantization footprints (1/2) When a<b, some bins are empty (holes). This happens because the second quantization re-distributes the quantized coefficients into more bins that the first quantization Consider a signal whose samples are normally distributed in [0,127]. The histogram of the signal quantized step 2 is the following: The histogram of signal quantized with step 3 followed by 2 is :

Double quantization footprints (2/2) When a>b, some bins contain more samples that neighbouring bins. This happens because even bins receive samples from four original bins, while the odd bins receive samples from only two Consider the same signal, now quantized with step 3. Its histogram is: The histogram of the signal quantized with step 2 followed by 3:

Double JPEG compression forensics Double quantization occurs when an image is JPEG compressed first with QF 1 and then JPEG compressed again with QF 2 choice of the quantization table) (Remind: QF rules the Typically, the former quality factor is lower than the latter (QF 1 < QF 2 ) most frequent case in practice More reliable detection of double JPEG compression Rule of thumb:

Detection of double JPEG compression Image Forensics proposes several detectors of double JPEG compression o Huang, Fangjun, Jiwu Huang, and Yun Qing Shi. "Detecting double JPEG compression with the same quantization matrix." Information Forensics and Security, IEEE Transactions on 5.4 (2010): 848-856. o Bianchi, Tiziano, and Alessandro Piva. "Detection of nonaligned double JPEG compression based on integer periodicity maps." Information Forensics and Security, IEEE Transactions on 7.2 (2012): 842-848. o Pevný, Tomáš, and Jessica Fridrich. "Detection of double-compression in JPEG images for applications in steganography." Information Forensics and Security, IEEE Transactions on 3.2 (2008): 247-258. o Bianchi, Tiziano, and Alessandro Piva. "Detection of non-aligned double JPEG compression with estimation of primary compression parameters." Image Processing (ICIP), 2011 18th IEEE International Conference on. IEEE, 2011. o o o Lukáš, Jan, and Jessica Fridrich. "Estimation of primary quantization matrix in double compressed JPEG images." Proc. Digital Forensic Research Workshop. 2003. Fu, Dongdong, Yun Q. Shi, and Wei Su. "A generalized Benford's law for JPEG coefficients and its applications in image forensics." Electronic Imaging 2007. International Society for Optics and Photonics, 2007. He, Junfeng, et al. "Detecting doctored JPEG images via DCT coefficient analysis." Computer Vision ECCV 2006. Springer Berlin Heidelberg, 2006. 423-435. o Popescu, Alin C., and Hany Farid. "Statistical tools for digital forensics."information Hiding. Springer Berlin Heidelberg, 2004.

One possible approach (1/4) Use machine learning techniques (Support Vector Machines) to build a detector that can distinguish between single quantized histograms ( without holes ) and double quantized histograms (with holes ) What is SVM? SVM: a supervised learning metodology that analysis data for classification Given a set of training examples with labels (marked for belonging to one of two categories), an SVM training algorithm builds a classifier that assigns new examples into one category or the other.

One possible approach (2/4) Step 1: preparation of image data sets Gather a rather large number of uncompressed (TIF) images (~500-1000) o Compress each image once with relatively low QF (e.g. 70) to create examples of the first class of images (C1) o Compress each image twice, first with QF=70 (e.g.) and then with a larger QF (e.g. 90) to create examples of the second class of images (C2) o (Look at the peak and gap artifact!!!!!)

One possible approach (3/4) Step 2: compute histograms of DCT coefficients For each image of the single quantized class (C1) o Divide the image in 8x8 blocks and compute the DCT for each block o Compute 64 DCT histograms (1 DC, 63 AC) and concatenate them all o (This vector must be fed to the SVM as example of the first class) For each image of the double quantized class (C2) o Divide the image in 8x8 blocks and compute the DCT for each block o Compute 64 DCT histograms (1 DC, 63 AC) and concatenate them all o This vector must be fed to the SVM as example of the second class

One possible approach (4/4) Step 3: train a Support Vector Machine Choose 90% of the images of each class to train the SVM (with N-fold crossvalidation) o Use LIBSVM MATLAB toolbox (https://www.csie.ntu.edu.tw/~cjlin/libsvm/) Step 4: test the above Support Vector Machine Use the remaining 10% of the images of each class to evaluate the accuracy of classification