Introduction to Compression. Norm Zeck

Introduction to Compression

2 Vita BSEE University of Buffalo (Microcoded Computer Architecture) MSEE University of Rochester (Thesis: CMOS VLSI Design) Retired from Palo Alto Research Center (PARC), a Xerox company (Rochester NY site). 37 years. Research projects: Selenium distillation (digital process controls), Mechatronics, VLSI for contour font rasterization, Halftone hardware/rendering, Document imaging systems, Multi-mode compression, Page Parallel Raster image processing (aka Hadoop, 10+ years earlier), Data Analytics 18 Patents Research Management: Competency. Area: 15 researchers, Lab: 85 researchers Special Projects/Operations ~225 researchers Program: 5-10 projects across centers. Transportation, Healthcare, Call centers, Government payment, Child Support

3 Goals Intro to lossless and lossy compression Focus on models, one example of encoding Little or no Math Use of examples to highlight main modeling concepts Launch point for individual study on topics of interest

4 Why Compress? Message Source Channel Storage Message Destination (application) Value Proposition Cable TV Video (MPEG) Cable Distribution System, DVR Home TV More channels to customer for same infrastructure investment Camera Sensor (JPEG, MPEG) Solid State Storage, Transmission (sharing) View/Print/Share Photo/video More pictures/video per storage unit, faster transmission on communication channel, miniaturization of phones/cameras. Digital Audio (MP3) Solid State Storage Listen to Audio More audio per storage, enabled miniaturization of player Document (CCITT G4/G3) Telephone/Fax Sent document Ability to use low bandwidth communications channels for document transmission. Universal availability to share documents. File storage (ZIP, PDF, Word. Disk/net File Exchange, Storage Storage/transmission efficiency

General Compression Design Elements 5 Application Application Model Encoder Model Decoder Compression Decompression Application Examples Text documents, wave files (mp3), images (jpeg) Model understands the information needed in the message to support the application and enable compression: Text has repetitive words/sets of words that are reused. Do not need to store each word, just a reference. The probability of occurrence of symbols in the message are known Human visual system model (jpeg, mpeg) Human auditory system model (mp3) Encoder is designed to take advantage of the model and application. Often includes a formatter as well (central to the application)

How to measure compression? 6 Lossless compression has typically four measures: Size: How much the message is compressed. Complexity: affects implementation - software/hardware Speed time to compress or decompress, closely tied to complexity Resources: Memory, ability to multi-thread, silicon real estate for hardware Lossy compression adds quality How much and what information to loose in the message Some applications may want compressor and decompressor to be different in complexity and performance DVD Compression: May want to spend a lot of time/complexity/analysis in compressor getting best quality/bit rate for the DVD master. DVD Decompressor: often implemented in commodity hardware or real-time software, is desired to be simpler, more constrained.

How to measure compression size? 7 Compression size measures Technical community (bigger is better) Ratio of original/compressed:1 100,000 bytes compressed to 50,000 bytes would be: 100,000/50,000 = 2:1 Some apps use percent of original, in this case 50% Streaming applications: MPEG (video), MP3 (audio) often use bit rate Application: Fixed/limited channel bandwidth: cable, telephone, cell, disk MP3: 128-256 K bits/second; MPEG-1 1.5Mbits/second; Mpeg-2 ~10Mbits/second

Two Lossless Compression 8 Models

9 LZ77 sliding window dictionary Model: I ve seen this before Many data sets have repeated information Word _the_ in text LZ77 Abraham Lempel, Jacob Ziv Past Data (search buffer) Future Data (look ahead) Current Pointer

LZ77 sliding window dictionary 10 As the input is read, a sliding window of data is kept. As more is read, that new input is added to the window and oldest data is discarded. Compression happens by finding repeated data at the input that is also in the window of data. The compressor emits a pointer to the data in the window and a copy length and next symbol. The longer the string of symbols that are found in the window, the longer the copy length and the better the input is compressed. Decompresssor just needs to maintain the window, index back into the window and copy out length symbols.

11 7-ZIP Compression Methods (from Wikipedia) DEFLATE Standard algorithm based on 32 kb LZ77 (LZSS Lempel Ziv Storer Szymanski actually) and Huffman coding. Deflate is found in several file formats including ZIP, gzip, PNG and PDF. 7-Zip contains a from-scratch DEFLATE encoder that frequently beats the de facto standard zlib version in compression size, but at the expense of CPU usage. DEFLATE64 LZ77 with 64kB window with Huffman coding. LZMA A variation of the LZ77 algorithm, using a sliding dictionary up to 4 GB in length for duplicate string elimination. The LZ stage is followed by entropy coding using a Markov chain-based range coder and binary trees. Window 64k to 1GB. LZMA2 modified version of LZMA providing better multithreading support and less expansion of incompressible data. (LZ77) Bzip2 The standard Burrows Wheeler transform (BWT) algorithm. Bzip2 uses two reversible transformations; BWT, then Move to front with Huffman coding for symbol reduction (the actual compression element). BWT orders text in such a way to increase the sequences of repeat characters PPMd (Prediction by Partial Matching) PPM models use a set of previous symbols in the uncompressed symbol stream to predict the next symbol in the stream. Can use Huffman, dictionary (LZ77) coding after the prediction.

Observations on images Consist of a series of scan lines horizontal rows of pixels Model: Scan lines tend to have runs of pixels as well as small differences between scan lines. 12 Notice small differences between scan lines Scan Lines are horizontal sequence of pixels Example runs of black pixels

PNG File Compression 13 The source file is analyzed to see which of the filters in the table works best for the current scan line. The byte type selects which filter to use. The predictor emits X or the difference between X and the predicted byte (pixel) based on the analysis. If the predictor is accurate, many common values will result (typically 0). This will result in better compression via the DEFLATE (LZ77) compressor. Decompressor implements the inverse. Source File 2-line Predictor A,B,C,D can be used to predict X Emits X (Raw) Or one of {Table Below} X Selected based on which one best predicts X DEFLATE PNG File Filer types chosen adaptively scan line by scan line

Lossy Image Compression

15 Standards JPEG Joint Photographic Experts Group Pink book is the standard MPEG Motion Picture Experts Group MPEG 1, 2 (DVD), MPEG layer 3 (MP3) (DVD audio) Many other standard, semi-standard and proprietary formats

16 Joint Photographic Experts Group (JPEG) Lossy compression modeled after the human visual system (HVS) ~120 Million Rods, ~5-6 Million Cones in the human eye

17 Human Eye Spatial Frequency Response Some spatial frequencies are less visible to the eye Color is less sensitive to higher frequencies JPEG takes advantage of this in choosing what information to loose

18 Color and Luminance are coded separately RGB color space has redundant information YCrCb: color is mapped to two components that can be further reduced in information content at a lower reduction in perceptual appearance; luminance is separate and can be treated differently; Low Complexity: Color space transform is a simple linear conversion Y = luminance (think monochrome), CrCb are the two color channels

19 Subsampling example

20 JPEG Block Diagram Key is the use of the Discrete Cosine Transform (DCT) to map the spatial image to a frequency domain. Frequencies that are not as visible are removed (quantized), then the remainder is lossless coded via Huffman coding. Color transform is included, then the color channels are down sampled to take advantage of the lower spatial frequency response

21 Divide image into 8x8 pixel blocks

22 Discrete Cosine Transform (DCT) Basis functions and sample reconstruction DCT

23 Discrete Cosine Transform (DCT) Basis functions and sample reconstruction *DC term, rest are called AC terms

24 JPEG Quantization Tables Based on psychovisual threshold experiments Luminance is not subsampled, lighter quantization Chrominance, subsampled 2:1, heavier quantization Luminance Quantization Table Chrominance Quantization Table Larger numbers more heavily quantize the DCT coefficient

25 JPEG Block Diagram Quantization adds redundancy to the coefficients. Encoder uses Huffman coding to efficiently represent the quantized coefficients.

26 Prefix Codes Model Encoder More redundant But not necessarily Encoded (represented) Efficiently More efficient representation (same information) Fixed length codes (ascii) (8 bits). Easy to use, not efficient (a is the same number of bits as z) Variable length codes. More frequent symbols, fewer bits Parse from Left to Right 0 10 110 1110 Prefix codes unique variable length codes that can easily be parsed

27 Huffman Encoding History In 1951, David A. Huffman and his MIT information theory classmates were given the choice of a term paper or a final exam. The professor, Robert M. Fano, assigned a term paper on the problem of finding the most efficient binary code. Huffman, unable to prove any codes were the most efficient, was about to give up and start studying for the final when he hit upon the idea of using a frequency-sorted binary tree and quickly proved this method the most efficient.[3] In doing so, Huffman outdid Fano, who had worked with information theory inventor Claude Shannon to develop a similar code. By building the tree from the bottom up instead of the top down, Huffman avoided the major flaw of the suboptimal Shannon- Fano coding. (Wikipedia) 3/2/2016

28 Example Message Message contains 132 Symbols = 528 bits (132*4 bits/symbol) 44444444 8888888888888 9999999999 2222 333333333333 2222 555555 44444444 555555 2222 8888888888888 2222 555555 2222 555555 333333333333 555555 9999999999

29 Example Message Huffman Coding Know the probabilities of a symbol occurring in the message Organize as a binary tree Apply a prefix code to uniquely identify sequences optimized by the ordering of the binary tree 2222 0.4 0 Prefix code 555555 0.3 333333333333 0.1 44444444 0.1 8888888888888 0.05 11110 1110 1111 110 111 0.2 10 11 0.3 1 0.6 1.0 9999999999 0.05 11111 0.1 Sequence Probability Code 2222 0.4 0 555555 0.3 10 3333333333333 0.1 110 44444444 0.1 1110 8888888888888 0.05 11110 9999999999 0.05 11111

30 Example Message Huffman encoded Message contains 132 Symbols = 528 bits (132*4 bits/symbol) 44444444 8888888888888 9999999999 2222 333333333333 2222 555555 44444444 555555 2222 8888888888888 2222 555555 2222 555555 333333333333 555555 9999999999 [1110] [11110] [11111] [0] [110] [0] [10] [1110] [10] [0] [11110] [0] [0] [10] [0] [10] [110] [10] [11111] 15 12 13 10 Number of bits Huffman encoding contains 50 bits Compression (528/50):1or 10.5:1 3/2/2016

31 Problem with high frequency term quantization Loss of high frequency terms results in ringing. The inability to reconstruct the edges. For the math, see Gibbs phenomenon.

32 Lossy compression: Quality? How to compare different algorithms or determine if the loss is ok for your application. Lossless just compare compression ratios. Lossy: Want to loose information that is not important to the application. Use a model of the perceptual part of the human visual system (HVS): Tried that we do not understand the HVS enough to make a good model Human psycho-visual experiments Select images, process, print or view original vs processed Rank with as many observers as you can get Expensive labor and controlled lab setup Population bias Researchers are very critical, others not enough Image type bias

33 Test image example Processing and printing/viewing and ranking images have a per image cost Select the image, process, print or view setup, schedule observers, process observations, monitor and check each step Often images contain a lots of content to stress and cover a wide range of possible errors in a single image Need to use customer/market images with customers

34 Test image example Compressed 48:1 Original Compressed

35 Comparison ~48:1 Original Compressed

37 LZMA SDK (you too can add compression to your application) LZMA SDK is available to use The LZMA SDK provides the documentation, samples, header files, libraries, and tools you need to develop applications that use LZMA compression. LZMA SDK includes: C++ source code of LZMA Encoder and Decoder C++ source code for.7z compression and decompression (reduced version) ANSI-C compatible source code for LZMA / LZMA2 / XZ compression and decompression ANSI-C compatible source code for 7z decompression with example C# source code for LZMA compression and decompression Java source code for LZMA compression and decompression lzma.exe for.lzma compression and decompression 7zr.exe to work with 7z archives (reduced version of 7z.exe from 7-Zip) SFX modules to create self-extracting packages and installers http://www.7-zip.org/sdk.html 3/2/2016

38 Moving Pictures Experts Group Layer III aka MP3 Audio Signal Filter Bank 32 Sub-bands MDCT 512 samples Non-uniform Quantizer Huffman Encoding Side Band Encoding Stream Formatting FFT Psychoacoustic Model Analysis and Modeling for Quantizer MDCT Multiple DCTs on each subband sample FFT Fast Fourier Transform map to frequency domain for analysis Raw CD audio is ~ 10 MB/minute MP3 compression is typically ~10-11:1 reducing to ~1MB/minute

39 MPEG Frame Encoding I = DCT encoded reference frame no other frames are used P = Use only previous frames for prediction B = Use both forward and previous frames for prediction