Data Compression - Seminar 4

Similar documents
Information Theory and Communication

Uniquely Decodable. Code 1 Code 2 A 00 0 B 1 1 C D 11 11

EDAA40 At home exercises 1

Chapter 5 Lempel-Ziv Codes To set the stage for Lempel-Ziv codes, suppose we wish to nd the best block code for compressing a datavector X. Then we ha

Complete Variable-Length "Fix-Free" Codes

Text Compression through Huffman Coding. Terminology

IS BINARY ENCODING APPROPRIATE FOR THE PROBLEM-LANGUAGE RELATIONSHIP?

Scribe: Virginia Williams, Sam Kim (2016), Mary Wootters (2017) Date: May 22, 2017

Chapter 5 VARIABLE-LENGTH CODING Information Theory Results (II)

QED Q: Why is it called the triangle inequality? A: Analogue with euclidean distance in the plane: picture Defn: Minimum Distance of a code C:

16 Greedy Algorithms

Move-to-front algorithm

SEQUENCES, MATHEMATICAL INDUCTION, AND RECURSION

CSC 373: Algorithm Design and Analysis Lecture 4

Slides for Faculty Oxford University Press All rights reserved.

Binomial Coefficients

Information Theory and Coding Prof. S. N. Merchant Department of Electrical Engineering Indian Institute of Technology, Bombay

February 24, :52 World Scientific Book - 9in x 6in soltys alg. Chapter 3. Greedy Algorithms

Binomial Coefficient Identities and Encoding/Decoding

OUTLINE. Paper Review First Paper The Zero-Error Side Information Problem and Chromatic Numbers, H. S. Witsenhausen Definitions:

IEEE TRANSACTIONS ON INFORMATION THEORY, VOL. 52, NO. 4, APRIL

Variable-length Splittable Codes with Multiple Delimiters

Chapter 5: Data compression. Chapter 5 outline

Lecture 1. 1 Notation

SEQUENCES, MATHEMATICAL INDUCTION, AND RECURSION

Course notes for Data Compression - 2 Kolmogorov complexity Fall 2005

Encoding/Decoding. Russell Impagliazzo and Miles Jones Thanks to Janine Tiefenbruck. May 9, 2016

Lecture 13. Types of error-free codes: Nonsingular, Uniquely-decodable and Prefix-free

Dynamic Programming Algorithms

Lecture 15. Error-free variable length schemes: Shannon-Fano code

Horn Formulae. CS124 Course Notes 8 Spring 2018

CHENNAI MATHEMATICAL INSTITUTE M.Sc. / Ph.D. Programme in Computer Science

Context-Free Grammars

Lecture 17. Lower bound for variable-length source codes with error. Coding a sequence of symbols: Rates and scheme (Arithmetic code)

IEEE TRANSACTIONS ON INFORMATION THEORY, VOL. 53, NO. 10, OCTOBER

Math 414 Lecture 2 Everyone have a laptop?

CSC 310, Fall 2011 Solutions to Theory Assignment #1

Complexity Theory. Compiled By : Hari Prasad Pokhrel Page 1 of 20. ioenotes.edu.np

UC San Diego UC San Diego Previously Published Works

Cantor s Diagonal Argument for Different Levels of Infinity

Lecture 5: Duality Theory

Theory of Computation Prof. Raghunath Tewari Department of Computer Science and Engineering Indian Institute of Technology, Kanpur

Reflection in the Chomsky Hierarchy

Context-Free Languages and Parse Trees

Warm-Up Problem. Let L be the language consisting of as constant symbols, as a function symbol and as a predicate symbol. Give an interpretation where

6. Finding Efficient Compressions; Huffman and Hu-Tucker

COMPSCI 650 Applied Information Theory Feb 2, Lecture 5. Recall the example of Huffman Coding on a binary string from last class:

Exercises: Disjoint Sets/ Union-find [updated Feb. 3]

Languages and Finite Automata

CSE 20 DISCRETE MATH. Fall

Math 5593 Linear Programming Lecture Notes

Mathematics. Linear Programming

Shannon capacity and related problems in Information Theory and Ramsey Theory

HKN CS 374 Midterm 1 Review. Tim Klem Noah Mathes Mahir Morshed

We show that the composite function h, h(x) = g(f(x)) is a reduction h: A m C.

Digital Communication Prof. Bikash Kumar Dey Department of Electrical Engineering Indian Institute of Technology, Bombay

Foundations of AI. 9. Predicate Logic. Syntax and Semantics, Normal Forms, Herbrand Expansion, Resolution

16.Greedy algorithms

Comparing sizes of sets

Lecture 5: Properties of convex sets

6. Finding Efficient Compressions; Huffman and Hu-Tucker Algorithms

Solution to Problem 1 of HW 2. Finding the L1 and L2 edges of the graph used in the UD problem, using a suffix array instead of a suffix tree.

Multi-Cluster Interleaving on Paths and Cycles

Finite Automata Theory and Formal Languages TMV027/DIT321 LP4 2018

6. Advanced Topics in Computability

Algorithms Dr. Haim Levkowitz

This book is licensed under a Creative Commons Attribution 3.0 License

Section 3.1: Nonseparable Graphs Cut vertex of a connected graph G: A vertex x G such that G x is not connected. Theorem 3.1, p. 57: Every connected

Linear Programming in Small Dimensions

5. DUAL LP, SOLUTION INTERPRETATION, AND POST-OPTIMALITY

CS 125 Section #4 RAMs and TMs 9/27/16

Huffman Coding. Version of October 13, Version of October 13, 2014 Huffman Coding 1 / 27

CS473-Algorithms I. Lecture 11. Greedy Algorithms. Cevdet Aykanat - Bilkent University Computer Engineering Department

Efficient Sequential Algorithms, Comp309. Motivation. Longest Common Subsequence. Part 3. String Algorithms

UCSD ECE154C Handout #21 Prof. Young-Han Kim Thursday, June 8, Solutions to Practice Final Examination (Spring 2016)

Roundoff Errors and Computer Arithmetic

Equitable edge colored Steiner triple systems

Outline. Combinatorial Optimization 2. Finite Systems of Linear Inequalities. Finite Systems of Linear Inequalities. Theorem (Weyl s theorem :)

Computer Science 236 Fall Nov. 11, 2010

Bijective counting of tree-rooted maps and connection with shuffles of parenthesis systems

2.8. Connectedness A topological space X is said to be disconnected if X is the disjoint union of two non-empty open subsets. The space X is said to

15 212: Principles of Programming. Some Notes on Continuations

arxiv: v4 [math.gr] 16 Apr 2015

Greedy Algorithms CHAPTER 16

MATH 1A MIDTERM 1 (8 AM VERSION) SOLUTION. (Last edited October 18, 2013 at 5:06pm.) lim

Dynamic Programming Algorithms

Verifying a Border Array in Linear Time

Cuts from Proofs: A Complete and Practical Technique for Solving Linear Inequalities over Integers

THEORY OF COMPUTATION

Yes, it is non singular. The two codewords are different.

/633 Introduction to Algorithms Lecturer: Michael Dinitz Topic: Sorting lower bound and Linear-time sorting Date: 9/19/17

Basic Properties The Definition of Catalan Numbers

Limitations of Algorithmic Solvability In this Chapter we investigate the power of algorithms to solve problems Some can be solved algorithmically and

On the Relationships between Zero Forcing Numbers and Certain Graph Coverings

Table : IEEE Single Format ± a a 2 a 3 :::a 8 b b 2 b 3 :::b 23 If exponent bitstring a :::a 8 is Then numerical value represented is ( ) 2 = (

EXERCISES SHORTEST PATHS: APPLICATIONS, OPTIMIZATION, VARIATIONS, AND SOLVING THE CONSTRAINED SHORTEST PATH PROBLEM. 1 Applications and Modelling

arxiv:math.co/ v1 27 Jan 2006

1KOd17RMoURxjn2 CSE 20 DISCRETE MATH Fall

Digital Communication Prof. Bikash Kumar Dey Department of Electrical Engineering Indian Institute of Technology, Bombay

Module 7. Independent sets, coverings. and matchings. Contents

Transcription:

Data Compression - Seminar 4 October 29, 2013 Problem 1 (Uniquely decodable and instantaneous codes) Let L = p i l 100 i be the expected value of the 100th power of the word lengths associated with an encoding of the random variable X. Let L 1 = min L over all instantaneous codes; and let L 2 = min L over all uniquely decodable codes. What inequality relationship exists between L 1 and L 2? Solution. L = p i l 100 i L 1 = min{l : instantaneous codes} L 2 = min{l : uniquely decodable codes} We have L 1 L2, since all instantaneous codes are uniquely decodable. Any set of codeword lengths which achieve the minimum of L 2 will satisfy the Kraft inequality and hence we can construct an instantaneous code with the same codeword lengths, and hence the same L. Hence we have L 1 L 2. From both these conditions, we must have L 1 = L 2. Problem 2 (How many fingers has a Martian?) Let ( ) S1... S S = m. p 1... p m The S i s are encoded into strings from a D-symbol output alphabet in a uniquely decodable manner. If m = 6 and the codeword lengths are (l 1, l 2,..., l 6 ) = (1, 1, 2, 3, 2, 3), find a good lower bound on D. You may wish to explain the title of the problem. Solution. Uniquely decodable codes satisfy Kraft s inequality. Therefore f(d) = D 1 + D 1 + D 2 + D 3 + D 2 + D 3 < 1. 1

We have f(2) = 7/4 > 1, hence D > 2. We have f(3) = 26/27 < 1. So a possible value of D is 3. Our counting system is base 10, probably because we have 10 fingers. Perhaps the Martians were using a base 3 representation because they have 3 fingers. (Maybe they are like Maine lobsters?) Problem 3 (Slackness in the Kraft inequality) An instantaneous code has word lengths l 1, l 2,..., l m which satisfy the strict inequality D li < 1. The code alphabet is D = {0, 1, 2,..., D 1}. Show that there exist arbitrarily long sequences of code symbols in D which cannot be decoded into sequences of codewords. Solution. Instantaneous codes are pre x free codes, i.e., no codeword is a prefix of any other codeword. Let n max = max{n 1, n 2,..., n q }. There are D nmax sequences of length n max. Of these sequences, D nmax ni start with the i-th codeword. Because of the prefix condition no two sequences can start with the same codeword. Hence the total number of sequences which start with some codeword is q = D n max q Dnmax ni D ni < D nmax. Hence there are sequences which do not start with any codeword. These and all longer sequences with these length nmax sequences as prefixes cannot be decoded. (This situation can be visualized with the aid of a tree.) Alternatively, we can map codewords onto dyadic intervals on the real line corresponding to real numbers whose decimal expansions start with that codeword. Since the length of the interval for a codeword of length n i is D ni, and D ni < 1, there exists some interval(s) not used by any codeword. The binary sequences in these intervals do not begin with any codeword and hence cannot be decoded. Problem 4 (Conditions for unique decodability) Prove that a code C is uniquely decodable if (and only if) the extension C k (x 1, x 2,..., x k ) = C(x 1 )C(x 2 )... C(x k ) is a one-to-one mapping from X k to D for every k 1. (The only if part is obvious.) Solution. If C k is not one-to-one for some k, then C is not UD, since there exist two distinct sequences, (x 1,..., x k ) and (x 1,..., x k ) such that C k (x 1,..., x k ) = C(x 1 )... C(x k ) = C(x 1)... C(x k) = C k (x 1,..., x k). Conversely, if C is not UD then by definition there exist distinct sequences of source symbols, (x 1,..., x i ) and (y 1,..., y j ), such that C(x 1 )C(x 2 )... C(xi) = C(y 1 )C(y 2 )... C(y j ). 2

Concatenating the input sequences (x 1,..., x i ) and (y 1,..., y j ), we obtain C(x 1 )... C(x i )C(y 1 )... C(y j ) = C(y 1 )... C(y j )C(x 1 )... C(x i ), which shows that C k is not one-to-one for k = i + j. Problem 5 (The Sardinas-Patterson test for unique decodability) A code is not uniquely decodable if and only if there exists a finite sequence of code symbols which can be resolved in two different ways into sequences of codewords. That is, a situation such as must occur where each Ai and each Bi is a codeword. Note that B1 must be a prefix of A1 with some resulting dangling suffix. Each dangling suffix must in turn be either a prefix of a codeword or have another codeword as its prefix, resulting in another dangling suffix. Finally, the last dangling suffix in the sequence must also be a codeword. Thus one can set up a test for unique decodability (which is essentially the Sardinas-Patterson test [1]) in the following way: Construct a set S of all possible dangling suffixes. The code is uniquely decodable if and only if S contains no codeword. (a) State the precise rules for building the set S. (b) Suppose the codeword lengths are l i, i = 1, 2,..., m. Find a good upper bound on the number of elements in the set S. (c) Determine which of the following codes is uniquely decodable: (i) {0, 10, 11}. (ii) {0, 01, 11}. (iii) {0, 01, 10}. (iv) {0, 01}. (v) {00, 01, 10, 11}. (vi) {110, 11, 10}. (vii) {110, 11, 100, 00, 10}. (d) For each uniquely decodable code in part (c), construct, if possible, an infinite encoded sequence with a known starting point, such that it can be resolved into codewords in two different ways. (This illustrates that unique decodability does not imply nite decodability.) Prove that such a sequence cannot arise in a prefix code. Solution. The proof of the Sardinas-Patterson test has two parts. In the first part, we will show that if there is a code string that has two different interpretations, then the code will fail the test. The simplest case is when the 3

concatenation of two codewords yields another codeword. In this case, S 2 will contain a codeword, and hence the test will fail. In general, the code is not uniquely decodable, iff there exists a string that admits two different parsings into codewords, e.g. x 1 x 2 x 3 x 4 x 5 x 6 x 7 x 8 = x 1 x 2, x 3 x 4 x 5, x 6 x 7 x 8 = x 1 x 2 x 3 x 4, x 5 x 6 x 7 x 8. In this case, S 2 will contain the string x 3 x 4, S 3 will contain x 5, S 4 will contain x 6 x 7 x 8, which is a codeword. It is easy to see that this procedure will work for any string that has two different parsings into codewords; a formal proof is slightly more difficult and using induction. In the second part, we will show that if there is a codeword in one of the sets S i, i 2, then there exists a string with two diff erent possible interpretations, thus showing that the code is not uniquely decodeable. To do this, we essentially reverse the construction of the sets. We will not go into the details - the reader is referred to the original paper. (a) Let S 1 be the original set of codewords. We construct S i+1 from Si as follows: A string y is in S i+1 iff there is a codeword x in S 1, such that xy is in S i or if there exists a z S i such that zy is in S 1 (i.e., is a codeword). Then the code is uniquely decodable iff none of the S i, i 2 contains a codeword. Thus the set S = S i. i 2 (b) A simple upper bound can be obtained from the fact that all strings in the sets S i have length less than l max, and therefore the maximum number of elements in S is less than 2 lmax. (c) (i) {0, 10, 11}. This code is instantaneous and hence uniquely decodable. (ii) {0, 01, 11}. This code is a suffix code (see problem 11). It is therefore uniquely decodable. The sets in the Sardinas-Patterson test are S 1 = {0, 01, 11}, S 2 = {1} = S 3 = S 4 =.... (iii) {0, 01, 10}. This code is not uniquely decodable. The sets in the test are S1 = {0, 01, 10}, S 2 = {1}, S 3 = {0},.... Since 0 is codeword, this code fails the test. It is easy to see otherwise that the code is not UD - the string 010 has two valid parsings. (iv) {0, 01}. This code is a suffix code and is therefore UD. The test produces sets S 1 = {0, 01}, S 2 = {1}, S3 =. (v) {00, 01, 10, 11}. This code is instantaneous and therefore UD. (vi) {110, 11, 10}. This code is uniquely decodable, by the Sardinas- Patterson test, since S 1 = {110, 11, 10}, S 2 = {0}, S3 =. (vii) {110, 11, 100, 00, 10}. This code is UD, because by the Sardinas Patterson test, S 1 = {110, 11, 100, 00, 10}, S 2 = {0}, S3 = {0}, etc. (d) We can produce infinite strings which can be decoded in two ways only for examples where the Sardinas Patterson test produces a repeating set. For example, in part (ii), the string 011111... could be parsed either as 4

0,11,11,... or as 01,11,11,.... Similarly for (vii), the string 10000... could be parsed as 100, 00, 00,... or as 10, 00, 00,.... For the instantaneous codes, it is not possible to construct such a string, since we can decode as soon as we see a codeword string, and there is no way that we would need to wait to decode. References [1] A.A. Sardinas and G.W. Patterson. A necessary and sufficient condition for the unique decomposition of coded messages. In IRE Convention Record, Part 8, pages 104 108, 1953. 5