EE67I Multimedia Communication Systems Lecture 4 Lossless Compression Basics of Information Theory Compression is either lossless, in which no information is lost, or lossy in which information is lost. A desirable compression ratio is one where B 0 /B 1 1. Information Theory The entropy η of an information source with alphabet S = {s 1, s 2,.. s n } is defined below, where p i is the probability that symbol s i in S would occur. The term log 2 (1/ p i ) indicates the amount of information (self-information) and corresponds to the number of bits needed to code s i. Entropy is a measure of disorder in a system and so negative entropy is when order is added to a system. The figure on the left above shows a histogram of an image with uniform distribution of gray-level intensities, such that p i = 1/256. The entropy of this image is: The average code length is usually greater or equal to the entropy: The figure on the right above shows where ⅔ of the image is bright and ⅓ is dark. Below is the entropy calculation.
Run-length Coding (RLC) Coding is performed in groups where symbols form a continuous repeating group. A bi-level image (1-bit black-and-white pixels) has monotone regions which can be coded using RLC. Only need to code the length of each run for a particular color. Shannon-Fano Algorithm (top-down) Variable-Length Coding (VLC) Steps 1. Sort symbols according to frequency count of their occurrences. 2. Recursively divide symbols into two parts, each with approximately the same number of counts, until all parts contain only one symbol. The entropy is as follows: Another coding tree:
Huffman Coding (bottom-up) Algorithm 1. Initialization: put all symbols on the list sorted according to frequency counts. 2. Repeat until the list has only one symbol left a. From the list, pick two symbols with the lowest frequency count and form a subtree with them as children and create a parent node for them. b. Assign the sum of the children frequency to the parent node and insert it into the list, such that the order is maintained. c. Delete the children from the list. 3. Assign a code word for each leaf. The algorithm is demonstrated using HELLO : Properties of Huffman 1. Unique prefix property makes for efficient decoding by precluding ambiguity. 2. Optimality minimum redundancy code a. If at least two symbols have the same length, their code differs by one bit. b. Frequently accruing symbols have shorter codes. c. Average code length is less than η+1 Extended Huffman For symbols with large probabilities, the amount of information is almost 0. Thus it is costly to use a bit to code this symbol. To counteract this a single code word is use to code a group of symbols:
The size of this alphabet is n k. If k is relatively large (e.g. k 3) then for most practical applications where n>>1, n k would be a large number, implying a large symbol table. The entropy of S is now: Adaptive Huffman Regular Huffman needs prior statistical information which is not always available in multimedia applications. This is an order-0 model where no prior information is maintained. An order-k model looks at k preceding symbols for contextual information. Probability distribution of received symbols changes the probabilities assigned to each symbol. The algorithm is as follows: Initial_code assigns symbols with some initially agreed-upon codes without prior knowledge. Update_tree is a procedure for constructing an adaptive Huffman tree by incrementing the frequency counts for the symbols and updating the configuration of the tree. o The Huffman tree must maintain a sibling property (all notes are arranged in the order of increasing counts) and if this is about to be violated a swap procedure is invoked. o In the swap procedure, the farthest node with count N is swapped with the node whose count has just been increased to N + 1. If the node is not a leaf-node, the entire subtree will go with it during the swap. The encoder and decoder must do the same Initial_code and Update_tree routines.
Example of Coding AADCCDD
Dictionary-based coding Lempel-Ziv-Welch (LZW) employs an adaptive, dictionary-based compression technique. It uses fixed length code words to represent varying length strings Builds dictionary dynamically as it receives data, so that both encoder and decoder have the same dictionary. It places longer and longer repeated entries into a dictionary rather than the string itself. Example for compression of string ABABBABCABABBA LZW uses 12-bit codelengths, and so the dictionary contains 4096 entries. LZW Decompression Algorithm
Example of LZW Decompression Example where LZW encounters difficulty: ABABBABCABBABBAX Below is the modified LZW Decompression code:
The average code lengths are usually longer than 8-bit ASCII word lengths. If there is not a lot of redundancy data expansion occurs instead of data reduction. V. 42 compensates by having two modes: transparent and compressed. Compressed mode is run when data expansion is detected. Dictionary size is fixed and so adaptation fails once dictionary is full, which invokes a flushing of the dictionary entries above a threshold. Arithmetic Coding Arithmetic coding treats the whole message as one unit. Input data is broken into chunks to avoid error propagation. A message is represented by a half-open interval [a, b) where a and b are real numbers between 0 and 1. Initially [0, 1) the interval shortens as the message gets longer, and the number of bits represent the interval increases. Below is the encoder algorithm: The example below illustrates the encoder for alphabet [A, B, C, D, E, F, $] where $ is the terminator value. The probability distributions are shown in the table below.
The figure above shows the encoding of symbols CAEE$. The following table shows the resulting ranges. As the probabilities are directly related to the ranges, the range is equal to the probability of the coded symbols occurring in succession. The following algorithm is used to generate codeword for the encoder. In the above code, value (code) is the value of the binary FRACTION bit, e.g. 0.1 2 = 0.5 10. Code generator yields 0.01010101 which is equal to 0.33203125 10. In the worst case scenario, the shortest codeword in arithmetic coding will require k bits to encode a sequence of symbols, and the following holds where P i is the probability for symbol i and range is the final range generated by the encoder. As the message is long, the difference between the log 2 (1/range) and ceiling of log 2 (1/range) is negligible.
Below is the routine used in the Arithmetic Coding Decoder: Using the above example the following table is yielded: It is possible to rescale the intervals and use only integer arithmetic to make implementation more practical. If channel/network is noisy, then the terminator can be corrupted leading to the encoder and the decoder going out of sync. Differential Coding of Images Lossless Image Compression For images we look at numbers in two dimensions (x, y). Because of continuity, gray-level intensities of background and foreground objects in images tend to change relatively slowly across the image frame. Given an original image I(x, y), the equation below defines the difference operator: This is a simple approximation of a partial differential operator δ/δx applied to an image defined in terms of x and y. Using the 2D Laplacian operator to define a difference image d(x, y) yields: Image (a) below shows the original image, with image (b) being the partial derivative (d(x, y)) of image (a). The histogram for (a) shown in image (c) is broader, indicating that there is more entropy in it than in the histogram for (b) shown in image (d). Thus compression will work better on a difference image.
Lossless JPEG This is invoked by choosing a 100% quality factor in JPEG. Two steps are involved: forming a differential prediction and encoding. 1. Prediction combines values of up to three neighboring pixels using one of seven schemes from the table below. 2. The encoder then compares prediction to actual pixel value and encodes the difference using lossless techniques. Lossless JPEG yields a low compression ratio making it impractical for multimedia. The table below compares the performance of different lossless techniques on different images.