1 Data Compression Fundamentals Touradj Ebrahimi Touradj.Ebrahimi@epfl.ch
2 Several classifications of compression methods are possible Based on data type :» Generic data compression» Audio compression» Image compression» Video compression» Virtual reality compression
3 Based on compression type :» Lossless : The decoded (uncompressed) data will be exactly equal to the original (modest compression ratios).» Lossy : The decoded (uncompressed) data will be a replica of the original, but not necessarily the same (higher compression ratios).
4 Information theory was developed to provide a mathematical tool to better design data compression algorithms. Entropy of the source generating the data: It is impossible to compress data in a lossless way with a bitrate less than the entropy of the source that generated it. The entropy H pf the source generating a data is in general impossible to measure in practice, due to the large amount of interdependencies (of infinite order) and the non-stationarities. Usually, a zero-order entropy measure is used to estimate the entropy of the source: H 0 = p i.log 2 (p i ) i S
5 Lossless data compression is widely used in computers. They are based on the following approaches Huffman coding Arithmetic coding Substitutional (dictionary based) coding
6 Huffman codes represent every symbol with a number of bits inversely proportional to its frequency (probability of appearance) Mechanism : A binary tree is built bottom-up by grouping the symbols with the smallest probabilities, and by assigning the sum of the probability of children to the parent node. Each symbol is then represented by one leaf of the tree and coded by its address
7 Huffman coding (example) symbol frequency A B C D E 15 7 6 6 5
8 Huffman coding (example) A (15) B (7) C (6) D (6) E (5)
9 Huffman coding (example) 0 1 DE (11) A (15) B (7) C (6) D (6) E (5)
10 Huffman coding (example) 0 1 0 1 BC (13) DE (11) A (15) B (7) C (6) D (6) E (5)
11 Huffman coding (example) 0 1 BCDE (24) 0 1 0 1 BC (13) DE (11) A (15) B (7) C (6) D (6) E (5)
12 Huffman coding (example) 0 1 ABCDE (39) 0 1 BCDE (24) 0 1 0 1 BC (13) DE (11) A (15) B (7) C (6) D (6) E (5)
13 Huffman coding (example) 0 1 symbol code ABCDE (39) 0 1 BCDE (24) 0 1 0 1 A B C 0 100 101 BC (13) DE (11) A (15) B (7) C (6) D (6) E (5) D E 110 111
14 Arithmetic codes represent a sequence of symbols by assigning to them the binary representation of an interval of a length smaller than one. Always more efficient than Huffman codes Separate the model from bit assignment and therefore allow a simpler adaptive scheme Computationally efficient
15 Arithmetic coding (example) P(A) = 1/3 P(B) = 2/3 1 A 2/3 B 0
16 Arithmetic coding (example) P(A) = 1/3 P(B) = 2/3 1 A 8/9 AA AB 2/3 BA 4/9 B BB 0
17 Arithmetic coding (example) P(A) = 1/3 P(B) = 2/3 1 A 8/9 AA AB AAA AAB ABA ABB 2/3 B 4/9 BA BB 16/27 8/27 BAA BAB BBA BBB 0
18 Arithmetic coding (example) P(A) = 1/3 P(B) = 2/3 segment code 1 A 8/9 AA AB AAA AAB ABA ABB 31/32.11111 15/16.1111 14/16.1110 6/8.110 2/3 B 4/9 BA BB 16/27 8/27 BAA BAB BBA BBB 10/16.1010 4/8.100 3/8.011 1/4.01 0
19 Adaptive arithmetic coding Encoder Decoder Model Model
20 Substitutional codes use a dictionary of words while coding a sequence of symbols and use the position of the words to encode words rather than single symbols. Efficient compression Computationally efficient
21 Substitutional coding (example) input output dictionary /W E D / WE / W E D 256 256=/W 257=WE 258=ED 259=D/ 260=/WE... E 261=...
22 For lossy coding, the rate-distortion theory was developed Rate-distortion optimization : Find the lowest bitrate possible for a certain distortion, or the lowest distortion for a given bitrate. The most popular distortion measure is the mean square error (MSE): N MSE= 1 [x(i) x^ (i)] 2 N i=1 The MSE does not always reflect the real distortion perceived by human visual system.
23 Distortion Optimal R-D curve best solution target rate Rate