Lecture 5 Lossless Coding (II)

Lecture 5 Lossless Coding (II) May 20, 2009

Outline Review Arithmetic Coding Dictionary Coding Run-Length Coding Lossless Image Coding Data Compression: Hall of Fame References 1

Review

Image and video encoding: A big picture A/D Conversion Color Space Conversion Pre-Filtering Partitioning Predictive Coding Differential Coding Motion Estimation and Compensation Context-Based Coding Input Image/Video Pre- Processing Lossy Coding Lossless Coding Post- Processing (Post-filtering) Quantization Transform Coding Model-Based Coding Entropy Coding Dictionary-Based Coding Run-Length Coding Encoded Image/Video 3

The ingredients of entropy coding A random source (X, P) A statistical model (X, P ) as an estimation of the random source An algorithm to optimize the coding performance (i.e., to minimize the average codeword length) At least one designer 4

FLC, VLC and V2FLC FLC = Fixed-length coding/code(s)/codeword(s) Each symbol x i emitted from a random source (X, P) is encoded as an n-bit codeword, where X 2 n. VLC = Variable-length coding/code(s)/codeword(s) Each symbol x i emitted from a random source (X, P) is encoded as an n i -bit codeword. FLC can be considered as a special case of VLC, where n 1 = =n X. V2FLC = Variable-to-fixed length coding/code(s)/codeword(s) A symbol or a string of symbols is encoded as an n-bit codeword. V2FLC can also be considered as a special case of VLC. 5

Static coding vs. Dynamic/Adaptive coding Static coding = The statistical model P is static, i.e., it does not change over time. Dynamic/Adaptive coding = The statistical model P is dynamically updated, i.e., it adapts itself to the context (i.e., changes over time). Dynamic/Adaptive coding Context-based coding Hybrid coding = Static + Dynamic coding A codebook is maintained at the encoder side, and the encoder dynamically chooses a code for a number of symbols and inform the decoder about the choice. 6

A coding Shannon coding Shannon-Fano coding Huffman coding Arithmetic coding (Range coding) Shannon-Fano-Elisa coding Universal coding Exp-Golomb coding (H.264/MPEG-4 AVC, Dirac) Elias coding family, Levenshtein coding, Non-universal coding Truncated binary coding, unary coding, Golomb coding Rice coding Tunstall coding V2FLC David Salomon, Variable-length Codes for Data Compression, Springer, 2007 7

Shannon-Fano Code: An example X={A,B,C,D,E}, P={0.35,0.2,0.19,0.13,0.13}, Y={0,1} 0.35+0.2 0.19+0.13+0.13 A, B C, D, E 0.35 0.2 0.19 A B C 0.13+0.13 D, E 0.13 0.13 D E A Possible Code A 00 B 01 C 10 D 110 E 111 8

Universal coding (code) A code is called universal if L C 1 (H+C 2 ) for all possible values of H, where C 1, C 2 1. You may see a different definition somewhere, but the basic idea remains the same a universal code works like an optimal code, except there is a bound defined by a constant C 1. A universal code is called asymptotically optimal if C 1 1 when H. 9

Coding positive/non-negative integers Naive binary coding X =2 k : k-bit binary representation of an integer. Truncated binary coding 0 0 0 {z } k X =2 k +b: X={0,1,,2 k +b-1} 2 k b 1 b 0 b k 1 2 k b b 2 k 0 b k + b 1 1 1 Unary code (Stone-age binary coding) X = : X=Z + ={1,2, } f(x) =0 0 {z } x 1 ////////////////////// 1or 1 1 {z } 0 x 1 {z } k+1 10

Golomb coding and rice coding Golomb coding = Unary coding + Truncated binary coding An integer x is divided into two parts (quotient and remainder) according to a parameter M: q = bx/mc, r=mod(x, M)=x-q*M. Golumb code = unary code of q + truncated binary code of r. When M=1, Golomb coding = unary coding. When M=2 k, Golomb coding = Rice coding. Golomb code is the optimal code for the geometric distribution: Prob(x=i)=(1-p) i-1 p, where 0<p<1. 11

Exp-Golomb coding (Universal) Exp-Golomb coding Golomb coding Exp-Golomb coding of order k=0 is used in some video coding standards such as H.264. The encoding process X = : X={0,1,2, } Calculate q = bx/2 k c +1, and n q = blog 2 qc. Exp-Golomb code = unary code of n q + n q LSBs of q + k-bit representation of r=mod(x,2 k )=x-q*2 k. 12

Huffman code: An example X={1,2,3,4,5}, P=[0.4,0.2,0.2,0.1,0.1]. p 1+2+3+4+5 =1 A Possible Code p 1+2 =0.6 p 3+4+5 =0.4 1 00 2 01 p 1 =0.4 p 2 =0.2 p 3 =0.2 p 4+5 =0.2 3 10 4 110 5 111 p 4 =0.1 p 5 =0.1 13

Huffman code: An optimal code Relation between Huffman code and Shannon code: H L Huffman L Huffman-Fano L Shannon <H+1 A stronger result (Gallager, 1978) When p max 0.5, L Huffman H+p max <H+1 When p max <0.5, L Huffman H+p max +log 2 (2(log 2 e)/e) H+p max +0.086<H+0.586 Huffman s rules of optimal codes imply that Huffman code is optimal. When each p i is a negative power of 2, Huffman code reaches the entropy. 14

Huffman code: Small X problem Problem When X is small, the coding performance is less obvious. As a special case, when X =2, Huffman coding cannot compress the data at all no matter what the probability is, each symbol has to be encoded as a single bit. Solutions Solution 1: Work on X n rather than X. Solution 2: Dual tree coding = Huffman coding + Tunstall coding 15

Huffman code: Variance problem Problem There are multiple choices of two smallest probabilities, if more than two nodes have the same probability during any step of the coding process. Huffman codes with a larger variance may cause trouble for data transmissions via a CBR (constant bit rate) channel a larger buffer is needed. Solution Shorter subtrees first. (A single node s height is 0.) 16

Modified Huffman code Problem If X is too large, the construction of the Huffman tree will be too long and the memory used for the tree will be too demanding. Solution Divide X into two set X 1 ={s i p(s i )>2 -v }, X 2 ={s i p(s i ) 2 -v }. Perform Huffman coding for the new set X 3 =X 1 {X 2 }. Append f(x 2 ) as the prefix of naive binary representation of all symbols in X 2. 17

Huffman s rules of making optimal codes Source statistics: P=P 0 =[p 1,,p m ], where p 1 p m-1 p m. Rule 1: L 1 L m-1 =L m. Rule 2: If L 1 L m-2 <L m-1 =L m, L m-1 and L m differs from each other only for the last bit, i.e., f(x m-1 )=b0 and f(x m )=b1, where b is a sequence of L m -1 bits. Rule 3: Each possible bit sequence of length L m -1 must be either a codeword or the prefix of some codewords. Answers: Read Section 5.2.1 (pp. 122-123) of the following book Yun Q. Shi and Huifang Sun, Image and Video Compression for Multimedia Engineering, 2 nd Edition, CRC Press, 2008 18

Justify Huffman s rules Rule 1 If L i >L i+1, we can swap them to get a smaller average codeword length (when P i =P i+1 it does not make sense, though). If L m-1 <L m, then L 1,,L m-1 <L m and the last bit of L m is redundant. Rule 2 If they do not have the same parent node, then the last bits of both codewords are redundant. Rule 3 If there is an unused bit sequence of length L m-1, we can use it for L m. 19

Arithmetic Coding

Why do we need a new coding algorithm? Problems with Huffman coding Each symbol in X is represented by at least one bit. Coding for X n : A Huffman tree with X n nodes needs to be constructed. The value of n cannot be too large. Encoding with a Huffman tree is quite easy, but decoding can be difficult especially when X is large. Dynamic Huffman coding is slow due to the update of the Huffman tree from time to time. Solution: Arithmetic coding Encode a number of symbols (can be a message of a very large size n) incrementally (progressively) as a binary fraction in a real range [low, high) [0,1). 21

The name of the game: The history Shannon-Fano-Elias coding (sometimes simply Elias coding) Shannon actually mentioned something about it in 1948. Elias invented the recursive implementation, but didn t publish it at all. Abramson introduced Elias idea in his book Information Theory and Coding (1963) as a note on pages 61-62. Practical arithmetic coding Rissanen and Pasco proposed the finite-precision implementation in 1976 independently. The name arithmetic coding was coined by Rissanen. The idea was patented by the IBM (Langdon and Rissanen as the inventors) in 1970s and afterwards more related patents were filed. Witten, Neal and Cleary published source code in C in 1987, which made arithmetic coding more popular. Range coding It was proposed by Martin in 1979, but it is just a different way (a patent-free one ) of looking at arithmetic coding and implementing it. 22

Associating input symbol(s) with intervals Given a source (X,P): P=[p(x 1 ),,p(x i ),,p(x m )] For any symbol x i X The associated sub-interval is [p c (x i ), p c (x i+1 )=p c (x i )+p(x i )), where p c (x 1 )=0 and p c (x m+1 )=p c (x m )+p(x m )=1. For any n-symbol (n>1) word x=(x i 1,,x in ) X n The associated sub-interval is [p c (x), p c (x)+p(x)), where p c (x) is the starting point of the interval corresponding to x and p(x)=p(x i1 ) p(x in ) for a memoryless random source. 23

Arithmetic coding: An example X={A,B,C,D}, P=[0.6,0.2,0.1,0.1]. Input: x=acd Output: y [0.534, 0.54) 24

Arithmetic coding: Which fraction? Which fraction in [p c (x), p c (x)+p(x)) should be chosen? One proper choice: 1/2 l p(x)/2 l = dlog 2 (1/p(x))e +1 choose f(x) to be a binary integer i such that (i- 1)/2 l p c (x)<i/2 l <p c (x)+p(x) H b (P) L<H b (P)+2/n p c (x) i/2 l 0 1 Assuming the above choice is used to design a code f:x n Y*, it can be proved that f is a PF code. n can be as large as the size of the input message x. ) p(x) p c (x)+p(x) What is this? 25

Arithmetic coding: Termination problem Termination of the original message x Transmit the size of x to the decoder separately. Add an end of message symbol to X. Make use of semantic information. Termination of the encoded bitstream f(x) Transmit the number of bits to the decoder separately. Use the choice of fraction introduced in last slide (with known value of m). Add an end of message symbol to X. Make use of semantic information. 26

Arithmetic coding: Performance Arithmetic coding: H b (P) L<H b (P)+2/n Huffman coding: H b (P) L<H b (P)+1/n Arithmetic coding < Huffman coding? Theoretically yes if all the entropy coding schemes are working on the same source (X n,p n ). Practically no The complexity of almost all other entropy codes (including Huffman code and Shannon code) will soon reach its computational limit when n. But arithmetic coding works in an incremental way, so n can be as large as the size of the to-beencoded message itself. Arithmetic coding > Huffman coding! 27

Arithmetic coding: Noticeable features Non-block coding (Incremental coding) The output of an encoder is not the concatenation of codewords of consecutive symbols. Range coding The encoded message is actually a range in [0,1]. Separation between coding and the source s statistics The probability is used to divide a range with the coding process (instead of before it). Arithmetic coding is inherently dynamic/adaptive/contextbased coding. 28

Arithmetic coding: Practical issues Finite-precision problem Approximating probabilities and the arithmetic operations in finite-precision loss of coding efficiency, but negligible if the precision is high enough. Common bits of the two end points can be removed immediately and sent out. A renormalization process is needed to re-calculate the new values of the two end points. Carry-propagation problem A carry (caused by the probability addition) may propagate over q bits (such as 0.011111110 0.100000000 ), so an extra register has to be used to track this problem. 29

Binary arithmetic coding X=Y={0,1}, P=[p 0,1-p 0 ] n-bit finite precision is used: initially, [0,1) [0,2 n -1) Update the range as usual: low =low or low+(high-low)*p 0, high =low+(high-low)*p 0 or high. Do nothing if high-low 2 n-1 Do renormalization when high-low<2 n-1 When low, high<2 n-1 output a 0-bit, and [0,2 n-1 ) [0,2 n -1). When low, high 2 n-1 output a 1-bit, subtract 2 n-1 from low, high, and [2 n-1,2 n -1) [0,2 n -1). low<2 n-1 (low=01 ) but high 2 n-1 (high=10 ) store an unknown bit (which is the complement of the next known bit) in the buffer, subtract 2 n-2 from low, high, and [2 n-2,2 n-1 +2 n-2 ) [0,2 n -1) After all symbols are encoded, terminate the process by sending out a sufficient number of bits to represent the value of low. 30

Binary arithmetic coding 31

Multiplication-free implementation For binary arithmetic coding, we can further approximate high-low α=¾ ( high-low=½~1) Then, the update of low and high can be approximated without doing the multiplication. The optimal value of α is actually depends on the probability p 0, and in most cases smaller than ¾ (typical value is ⅔ or around 0.7). When p 0 >1/(2α)=⅔, it is possible αp 0 >½>high-low, so high <low, which is a big mistake. Solution: dynamically exchange p 0 and p 1 such that p 0 <½ always holds. 0 = LPS (less/least probable symbol) and 1 = MPS (more/most probable symbol). 32

Q- arithmetic coders Q-coder The first multiplication-free binary arithmetic coder. It is less efficient than ideal arithmetic coders, but the loss of compression efficiency is less than 6%. Why was it named after Q? high-low=a, p LPS =Q low =low or low+(a-q), A =A-Q or Q. The renormalization process ensures A [0.75,1.5) A 1 (α=⅔). QM-coder (JPEG/JBIG) An enhanced edition of Q-coder. MQ-coder (JPEG2000/JBIG2) A context-based variant of QM-coder. AQ Q high=low+a A(1-Q) A-Q low 33

Dictionary Coding

The basic idea Use a static/dynamic dictionary and encode a symbol or a sequence of symbols as the index of the dictionary. The dictionary can be implicit or explicit. The dictionary can be considered as an (maybe very rough) approximation of the probability distribution of the source. 35

Static dictionary coding: Digram coding A static dictionary of size 2 n contains X symbols in X and also 2 n - X most frequently used symbols in X 2 (i.e., pairs of symbols in X). An example X ={A,B,C,D,E}, n=3 ABDEBE 101011100111 Code/Index Entry A 000 B 001 C 010 D 011 E 100 AB 101 AD 110 BE 111 36

Adaptive dictionary coding LZ (Lempel-Ziv) family LZ77 (LZ1) and LZ78 (LZ2) DEFLATE: LZ77 + Huffman Coding LZMA (Lempel-Ziv-Markov chain-algorithm): DEFLATE + Arithmetic Coding LZW (Lempel-Ziv-Welch): Improved LZ78 LZSS (Lempel-Ziv-Storer-Szymanski): Improved LZ77 LZO (Lempel-Ziv-Oberhumer): Fast LZ LZRW (Lempel-Ziv-Ross Williams): Improved LZ77 LZJB (Lempel-Ziv-Jeff Bonwick): Improved LZRW1 37

LZ77 (Sliding window compression) A sliding window = a search buffer + a look-ahead buffer Only exploit local redundancies. Use implicit dictionary. Encoding: BADA (o,l, A ), (o=l=0 for no match) l Sliding window A E D B A D C E B A D A E A Search buffer Look-ahead buffer o 38

LZ78 Problems with LZ77 Nothing can be captured out of the sliding window. Solution: Using an explicit dictionary At the beginning, the dictionary is empty. An encoded symbol or a sequence of symbols that cannot be found in the dictionary is added into the dictionary as a new entry. The output is (i,c), where i is the dictionary index and c is the next symbol following the dictionary entry. 39

LZW Main advantage over LZ78 No need to transmit the next symbol c, but only i. Encoding procedure Step 1: Initialize the dictionary to include all single symbols in X. Step 2: Search the input sequence of symbols in the dictionary until x i+1 x j can be found but x i+1 x j x j+1 cannot, then output the index of x i+1 x j and add x i+1 x j x j+1 as a new entry in the dictionary. Step 3: Repeat Step 2 until all symbols are encoded. 40

Run-Length Coding

1-D run-length coding Statistical model: Discrete Markov Source S={S 1,S 2 } with 4 transition probabilities: P(S i /S j ), i, j {1,2}. S 1 S 2 P(S 1 /S 1 ) P(S 2 /S 2 ) P(S 2 /S 1 ) P(S 1 /S 2 ) Runs: Repetition of symbols S 1 or S 2 How to code: (S i, Run1,Run2, ) All the run-lengths form a new random source, which can be further coded with a (modified) Huffman coder or a universal code. 42

Lossless Image Coding

How entropy coding is used Huffman/Arithmetic coding + Dictionary coding + Universal codes + Predictive coding Spatial difference tends to have a monotonic probability distribution Universal codes + Run-length coding The run-length pairs tends to have a monotonic probability distribution 44

Lossless image coding standards JBIG (Joint Bi-level Image Experts Group): QM-coder Lossless JPEG family: Huffman coding/arithmetic coding JBIG 2: MQ-coder GIF (Graphics Interchange Format): LZW PNG (Portable Network Graphics): DELATE = LZ77 + Huffman coding TIFF (Tagged Image File Format): RLE, LZW Compressed BMP and PCX: RLE 45

Lossy coding followed by lossless coding A/D Conversion Color Space Conversion Pre-Filtering Partitioning Predictive Coding Differential Coding Motion Estimation and Compensation Context-Based Coding Input Image/Video Pre- Processing Lossy Coding Lossless Coding Post- Processing (Post-filtering) Quantization Transform Coding Model-Based Coding Entropy Coding Dictionary-Based Coding Run-Length Coding Encoded Image/Video 46

Data Compression: Hall of Fame

Hall of Fame Abraham Lempel (1936-) IEEE Richard W. Hamming Medal (2007) Fellow of IEEE (1982) Jacob Ziv (1931-) IEEE Richard W. Hamming Medal (1995) Israel Prize (1993) Member (1981) and former President (1995-2004) of Israel National Academy of Sciences and Humanities Fellow of IEEE (1973) 48

References

References for further reading Norman L. Biggs, Codes: An Introduction to Information Communication and Cryptography, Sections 4.6-4.8, Springer, 2008 Khalid Sayood, Introduction to Data Compression, Chapter 4 Arithmetic Coding and Chapter 5 Dictionary Techniques, 3 rd Edition, Morgan Kaufmann, 2005 Yun Q. Shi and Huifang Sun, Image and Video Compression for Multimedia Engineering: Fundamentals, Algorithms, and Standards, Chapter 6 Run-Length and Dictionary Coding - Information Theory Results (III), 2 nd Edition, CRC Press, 2008 David S. Taubman and Michael W. Marcellin, JPEG2000: Image Compression Fundamentals, Standards and Practice, Section 2.3 Arithmetic Coding, Kluwer Academic Publishers, 2002 Tinku Acharya and Ping-Sing Tsai, JPEG2000 Standard for Image Compression: Concepts, Algorithms and VLSI Architectures, Sections 2.3-2.5, John Wiley & Sons, Inc., 2004 Mohammed Ghanbari, Standard Codecs: Image Compression to Advanced Video Coding, Section 3.4.2 Arithmetic coding, IEE, 2003 50