Chapter 5: Data compression. Chapter 5 outline

Size: px

Start display at page:

Download "Chapter 5: Data compression. Chapter 5 outline"

Mitchell Lester
5 years ago
Views:

1 Chapter 5: Data compression Chapter 5 outline 2 balls weighing problem Examples of codes Kraft inequality Optimal codes + bounds Kraft inequality for uniquely decodable codes Huffman codes Shannon-Fano-Elias coding

2 You are given 2 balls, all equal in weight except for one that is either heavier or lighter. You are also given a two-pan balance to use. In each use of the balance you may put any number of the 2 balls on the left pan, and the same number on the right pan, and push a button to initiate the weighing; there are three possible outcomes: either the weights are equal, or the balls on the left are heavier, or the balls on the left are lighter. Your task is to design a strategy to determine which is the odd ball and whether it is heavier or lighter than the others in as few uses of the balance as possible. While thinking about this problem, you may find it helpful to consider the following questions: (a) How can one measure information? (b) When you have identified the odd ball and whether it is heavy or light, how much information have you gained? (c) Once you have designed a strategy, draw a tree showing, for each of the possible outcomes of a weighing, what weighing you perform next. At each node in the tree, how much information have the outcomes so far given you, and how much information remains to be gained? (d) How much information is gained when you learn (i) the state of a flipped coin; (ii) the states of two flipped coins; (iii) the outcome when a four-sided die is rolled? (e) How much information is gained on the first step of the weighing problem if 6 balls are weighed against the other 6? How much is gained if 4 are weighed against 4 on the first step, leaving out 4 balls? 2 balls weighing: lighter or heavier Total information contained? Each weighing gives you how much information (ideally)? Number of weighings needed? Strategy?

3 weigh weigh weigh weigh Figure 4.2. An optimal solution to the weighing problem. At each step there are two boxes: the left [Mackay textbook pg. 69] Examples of codes What is X? What is D? What is D*? What is H(X)? What is L(C)? Decode

4 Examples of codes Examples of codes Meaning in lay terms? All codes TABLE 5. Classes of Codes Nonsingular, But Not Uniquely Decodable, X Singular Uniquely Decodable But Not Instantaneous Instantaneous 2 4 Nonsingular codes Uniquely decodable codes Instantaneous codes

5 Code trees A C D B Kraft inequality Want short, prefix codes. Kraft inequality quantifies tradeoff.

6 Code tree for Kraft inequality 8 DATA COMPRESSION Root FIGURE 5.2. Code tree for the Kraft inequality. by a leaf on the tree. The path from the root traces out the symbols of the codeword. A binary example of such a tree is shown in Figure 5.2. The prefix condition on the codewords implies that no codeword is an ancestor of any other codeword on the tree. Hence, each codeword eliminates its descendants as possible codewords. University Press longest 2. On-screen viewing permitted. permitted. Let lmax Copyright be thecambridge length of the codeword ofprinting thenotset of codewords. You can buy this book for pounds or $5. See for links. Consider all nodes of the tree at level lmax. Some of them are codewords, 96 some are descendants of codewords, and some are neither. A codeword 5 Symbol Codes at level li has D lmax li descendants at level lmax. Each of these descendant sets must be disjoint. Also, the total number of nodes in these sets must be less than or equal to D lmax. Hence, summing over all the codewords, we have Figure 5.. The symbol coding li budget. The cost 2 of each D lmax D lmax (5.7) codeword (with length l) is or D li, The total symbol code budget Kraft inequality and code budgets l indicated by the size of the box it is written in. The total budget available when making a uniquely decodeable code is. You can think of this diagram as showing a codeword supermarket, with the codewords arranged in aisles by their length, and the cost of each codeword indicated by the size of its box on the shelf. If the cost of the codewords that you take exceeds the budget then your code will not be uniquely decodeable. (5.8) which is the Kraft inequality. Conversely, given any set of codeword lengths l, l2,..., lm that sat isfy the Kraft inequality, we can always construct a tree like the one in

7 i= and = { 2, 4, 8, 8 }, PX The total chieve as much compression as possible = l(ai ). Prefix be represented and consider thecodes code can C. The entropy of X is.75 bits, and the expected + obtained where on,binary trees. Complete prefix L(C, X)I = ofby a Asymbol code C for ensemble X islength L(C }gth X. X) of this code is also.75 bits. The sequence of symbols C : codes correspond to + binary trees x = (acdbac) is encoded as. C is a prefix code ng codewords: c (x) = with no unused branches. C is an L(C, X) P (x) l(x). (5.5) ample 5.. Let= and is therefore uniquely decodeable. Notice that the codeword lengths incomplete code. a c(ai ) pi h(pi ) li x AX AX = { a, satisfy b, c, li = d log }, 2 (/pi ), or equivalently, pi = 2 li i. (5.) (5.7) /2 a and PX = { /2, /4, /8, /8 },. write this quantity as Example 5.. Consider the fixed length code for the same ensemble X, C 4. /4 b 2. 2 I and consider the code X isexpected.75 bits, and L(C the 4expected length, X) is 2 bits. C. The entropy of The /8 c. L(C,X) pi li code is also.75 bits. (5.6) The sequence of symbols length L(C, X)= of this /8. d i= + Example 5.2. Consider C. The expected length L(C, X) is.25 bits, which 5 5 x = (acdbac) is encoded as c (x) =. C is a prefix code is less than H(X). But the code is not uniquely decodeable. The sex. and is therefore uniquely decodeable. Notice that the codeword lengths quence C : as, which can also be decoded li.x = (acdbac) encodes satisfy l(5.2) = 2(cabdca). ai c(ai ) pliias i = log2 (/pi ), or equivalently, C4 C5 ai c(ai ) pi h(pi ) li AX = { a, b, c, d a }, 4 (5.7) ample 5.. Consider the fixed length code for the same ensemble X, C. Example 5.. Consider the code C. The expected length L(C a 6, X) of this 6. 4 /8, /8 }, /2 and PX = { /2, /4,C a : b 4 The sequence of symbols x = (acdbac) is encoded as The expected length L(C 4, X) is 2 bits. code is.75 bits. b / b he code C. The entropy of X is.75 bits, the expected c and 4c+ (x) =. /8 c. c X) of this code is also.75 bits. Thedsequence symbols of 4Is CL(C ample 5.2. Consider C5. The expected length is.25it bits, is/8not,which because both c(b). c(a) = is adprefix d code? 5, X) 6 a prefix of s encoded as c+ (x) =. C is a prefix code. (5.) and c(c). decodeable. The seis less decodeable. than H(X). Butthat thethecode is notlengths uniquely re uniquely Notice codeword li. quence x = (acdbac) encodes as, which can also be decoded 2 (/pi ), or equivalently, pi = 2 C6 : C4 C5 as (cabdca). t, any encoded a codeword 2 indicated. by the of each /4 box on sizebof its the2. shelf.2 If the /8.thatyou costc of the codewords /8. then your the taked exceeds budget code will not be uniquely decodeable. C4 C5 a b c d Kraft inequality and code budgets C6 : ai c(ai ) pi h(pi ) li a b c d / /4 /8 /8 ider the C4. a must befixed easylength to code for the same ensemble CX, C length L(C4, X) is 2 bits. the code C. The expected b 6, ample Consider length L(C X) of this 6 ion as 5.. possible. as is.75 bits.length TheL(C sequence of symbols x = (acdbac) cis encoded ider code C5. The expected, X) is.25 bits, which d + (x) =. 5 c (X). But the code is not uniquely decodeable. The se decoded cdbac) encodes as, which can also be Is C is a prefix ofcboth c(b) 6 a+prefix code? It is not, because c(a)= 6: ed code C, no and c(c). code? It is not, because c(a) = is a prefix of both c(b) fy the end of a can be a prefix if there exists a or example, is ider the code C6. The expected length L(C6, X) of this its. The sequence of symbols x = (acdbac) is encoded as (5.4). eable code. ai c(ai ) pi a b c d /2 /4 /8 /8 h(pi ) li 2 ai c(ai ) a b c d a prefix of any Kraft inequality example ndition codes. What about L = {2;2;;;;}? What about L = {2;2;2;;;}? we constrain our nctuating code, o right without codeword is imeable. pi C4h(pi ) /2. /4 2. /8. /8. li 2 C6 Figure 5.2. Selections of codewords made by codes C, C, C4 and C6 from section 5.. [Mackay textbook, Ch.5]

8 Extended Kraft inequality Kraft inequality for uniquely decodable codes

9 Optimal codes Optimal code = prefix code that minimizes the expected codeword length. Solution to: Bounds on optimal code length

10 Block coding Entropy rate and code length

11 The wrong distribution Design code for source distribution q(x) but true distribution is p(x). Can we quantify the loss in the expected length of the `wrong code? Shannon code

12 Shannon code example Shannon code competitive optimality

Shannon-Fano-Elias coding Cumulative distribution

distribution x p(x) F (x) F (x) F (x) in Binary

13 Shannon-Fano-Elias coding Cumulative distribution function (CDF) of X F(x) F(x) F(x) F(x ) p(x) Performance? 2 x x ( ) Shannon-Fano-Elias for dyadic distribution x p(x) F (x) F (x) F (x) in Binary l(x) = log + Codeword p(x) What is L(C)? What is H(X)?

Shannon-Fano-Elias for general distribution x p(x) F (x) F (x) F (x) in Binary l(x) = log + Codeword p(x).25.25.25. 2.25.5.75..2.7.6. 4 4.5.85.775. 4 5.5..925. 4 What is L(C)? What is H(X)?

14 Shannon-Fano-Elias for general distribution x p(x) F (x) F (x) F (x) in Binary l(x) = log + Codeword p(x) What is L(C)? What is H(X)? Huffman codes Huffman discovered a simple algorithm for constructing optimal (shortest expected length) codes for a given any distribution. Example 5.6. X ={, 2,, 4, 5} with probabilities.25,.25,.2,.5,.5, tively. We expect the optimal binary code for to have the Codeword Length Codeword X Probability H(X), L(C)? Example 5.6. X ={, 2,, 4, 5} with probabilities.25,.25,.2,.5,.5, tively. We expect the optimal binary code for to have the Codeword X Probability

15 Huffman codes Purpose of dummy symbols? Number of dummy symbols? Huffman code of English language a i p i log 2 pi l i c(a i) a b c d e f g h i j.6.7 k l m n o p q.8. 9 r s t u v w x y z a n c s i o e d h y u w v q m r l t b g f p k x j z [Mackay textbook, Ch.5]

16 Huffman codes C is a Huffman code Constructing Huffman codes Huffman code obtained by repeatedly ``merging" the last two symbols, assigning to them the ``last codeword minus the last bit", and reordering the symbols in order to have non-increasing probabilities or weights.

17 Comments on Huffman codes Equivalence of Huffman coding and 2 questions? Huffman coding versus Shannon coding? Strengths? Weaknesses? Rigorous proof of Huffman optimality

Information Theory and Communication

Information Theory and Communication Shannon-Fano-Elias Code and Arithmetic Codes Ritwik Banerjee rbanerjee@cs.stonybrook.edu c Ritwik Banerjee Information Theory and Communication 1/12 Roadmap Examples