Complete Variable-Length "Fix-Free" Codes

Similar documents
CSC 310, Fall 2011 Solutions to Theory Assignment #1

Lecture 2. 1 Introduction. 2 The Set Cover Problem. COMPSCI 632: Approximation Algorithms August 30, 2017

Data Compression - Seminar 4

Verifying a Border Array in Linear Time

Scribe: Virginia Williams, Sam Kim (2016), Mary Wootters (2017) Date: May 22, 2017

Information Theory and Communication

Reflection in the Chomsky Hierarchy

Group Secret Key Generation Algorithms

Lecture 15. Error-free variable length schemes: Shannon-Fano code

The Encoding Complexity of Network Coding

Learning Decision Lists

On the Relationships between Zero Forcing Numbers and Certain Graph Coverings

Computer Science 236 Fall Nov. 11, 2010

IEEE TRANSACTIONS ON INFORMATION THEORY, VOL. 52, NO. 4, APRIL

Disjoint directed cycles

/633 Introduction to Algorithms Lecturer: Michael Dinitz Topic: Approximation algorithms Date: 11/27/18

Information Theory and Coding Prof. S. N. Merchant Department of Electrical Engineering Indian Institute of Technology, Bombay

Chapter 5 Lempel-Ziv Codes To set the stage for Lempel-Ziv codes, suppose we wish to nd the best block code for compressing a datavector X. Then we ha

Lecture 1. 1 Notation

Definition For vertices u, v V (G), the distance from u to v, denoted d(u, v), in G is the length of a shortest u, v-path. 1

Pierre A. Humblet* Abstract

Horn Formulae. CS124 Course Notes 8 Spring 2018

February 24, :52 World Scientific Book - 9in x 6in soltys alg. Chapter 3. Greedy Algorithms

Complexity Theory. Compiled By : Hari Prasad Pokhrel Page 1 of 20. ioenotes.edu.np

Treewidth and graph minors

6. Finding Efficient Compressions; Huffman and Hu-Tucker

An Improved Upper Bound for the Sum-free Subset Constant

Maximal Monochromatic Geodesics in an Antipodal Coloring of Hypercube

1. Chapter 1, # 1: Prove that for all sets A, B, C, the formula

Digital Communication Prof. Bikash Kumar Dey Department of Electrical Engineering Indian Institute of Technology, Bombay

Greedy Algorithms CSE 780

1 Counting triangles and cliques

IS BINARY ENCODING APPROPRIATE FOR THE PROBLEM-LANGUAGE RELATIONSHIP?

Spanners of Complete k-partite Geometric Graphs

Forced orientation of graphs

Graph Connectivity G G G

A PROOF OF THE LOWER BOUND CONJECTURE FOR CONVEX POLYTOPES

Adjacent: Two distinct vertices u, v are adjacent if there is an edge with ends u, v. In this case we let uv denote such an edge.

Midterm 1 Solutions. (i) False. One possible counterexample is the following: n n 3

Basic Combinatorics. Math 40210, Section 01 Fall Homework 4 Solutions

Algorithms Dr. Haim Levkowitz

On Covering a Graph Optimally with Induced Subgraphs

Topology Homework 3. Section Section 3.3. Samuel Otten

Almost all Complete Binary Prefix Codes have a Self-Synchronizing String

Greedy Algorithms. CLRS Chapters Introduction to greedy algorithms. Design of data-compression (Huffman) codes

CSC 373: Algorithm Design and Analysis Lecture 4

Combinatorial Gems. Po-Shen Loh. June 2009

Superconcentrators of depth 2 and 3; odd levels help (rarely)

Bipartite Roots of Graphs

Greedy Algorithms CHAPTER 16

Optimal Variable Length Codes (Arbitrary Symbol Cost and Equal Code Word Probability)* BEN VARN

Solutions to In Class Problems Week 5, Wed.

Parameterized graph separation problems

On Sequential Topogenic Graphs

Math 170- Graph Theory Notes

Course notes for Data Compression - 2 Kolmogorov complexity Fall 2005

Basics of Graph Theory

Edge Colorings of Complete Multipartite Graphs Forbidding Rainbow Cycles

ENSC Multimedia Communications Engineering Huffman Coding (1)

Data Compression Techniques

1. Suppose you are given a magic black box that somehow answers the following decision problem in polynomial time:

Universal Cycles for Permutations

arxiv: v1 [cs.cg] 4 Dec 2007

Notes for Recitation 8

An Efficient Decoding Technique for Huffman Codes Abstract 1. Introduction

Solutions to Homework 10

Shannon capacity and related problems in Information Theory and Ramsey Theory

COMPSCI 650 Applied Information Theory Feb 2, Lecture 5. Recall the example of Huffman Coding on a binary string from last class:

A Fast Algorithm for Optimal Alignment between Similar Ordered Trees

4 Generating functions in two variables

Few T copies in H-saturated graphs

The External Network Problem

We will show that the height of a RB tree on n vertices is approximately 2*log n. In class I presented a simple structural proof of this claim:

Ramsey s Theorem on Graphs

UC San Diego UC San Diego Previously Published Works

This is already grossly inconvenient in present formalisms. Why do we want to make this convenient? GENERAL GOALS

Bipartite Coverings and the Chromatic Number

Variable-length Splittable Codes with Multiple Delimiters

CMPSCI 311: Introduction to Algorithms Practice Final Exam

Greedy Algorithms CSE 6331

Multiple Vertex Coverings by Cliques

Multi-Cluster Interleaving on Paths and Cycles

2 The Fractional Chromatic Gap

ON WEAKLY COMPACT SUBSETS OF BANACH SPACES

Text Compression through Huffman Coding. Terminology

and Heinz-Jürgen Voss

Flexible Coloring. Xiaozhou Li a, Atri Rudra b, Ram Swaminathan a. Abstract

Greedy Algorithms 1 {K(S) K(S) C} For large values of d, brute force search is not feasible because there are 2 d {1,..., d}.

(Received Judy 13, 1971) (devised Nov. 12, 1971)

Lecture 6,

Decreasing the Diameter of Bounded Degree Graphs

Abstract. A graph G is perfect if for every induced subgraph H of G, the chromatic number of H is equal to the size of the largest clique of H.

A Reduction of Conway s Thrackle Conjecture

CHAPTER 10 GRAPHS AND TREES. Alessandro Artale UniBZ - artale/

arxiv:submit/ [math.co] 9 May 2011

[Ch 6] Set Theory. 1. Basic Concepts and Definitions. 400 lecture note #4. 1) Basics

Introduction to Algorithms I

CL i-1 2rii ki. Encoding of Analog Signals for Binarv Symmetric Channels A. J. BERNSTEIN, MEMBER, IEEE, K. STEIGLITZ, MEMBER,

Compressing Data. Konstantin Tretyakov

Discrete mathematics , Fall Instructor: prof. János Pach

Transcription:

Designs, Codes and Cryptography, 5, 109-114 (1995) 9 1995 Kluwer Academic Publishers, Boston. Manufactured in The Netherlands. Complete Variable-Length "Fix-Free" Codes DAVID GILLMAN* gillman @ es.toronto.edu Computer Science Department, University of Toronto, Toronto, Ontario Canada MSSIA4 RONALD L. RIVEST** rivest@theory.lcs.mit.edu Laboratory for Computer Science, Massachusetts btstitute of Technology, Cambridge, MA 02139 Communicated by: S. Vanstone Received July 26, 1993; Revised May 11, 1994 Abstract. A set of codewords isfix-free if it is both prefix-free and suffix-free: no codeword in the set is a prefix or a suffix of any other. A set of codewords {Xl, x2... x, } over a t-letter alphabet E is said to be complete if it satisfies the Kraft inequality with equality, so that ~t-lxl] =. 1 l<_i<n The set E k of all codewords of length k is obviously both fix-free and complete. We show, surprisingly, that there are other examples of complete fix-free codes, ones whose codewords have a variety of lengths. We discuss such variable-length (complete) fix-free codes and techniques for constructing them. 1. Introduction While investigating the properties of Huffman codes [1], we became interested in codes that were both prefix-free and suffix-free (or "fix-free"). Fix-free codes have a number of interesting properties. For example, a word constructed by concatenating together a number of codewords from a fix-free code can be uniquely parsed from either end. (And fix-free codes have terrible synchronization properties: being suffix-free means that if the channel drops a bit the receiver can not easily resynchronize. 1) However, it was not clear to us whether fix-free codes could be as efficient as ordinary prefix-free codes. We thus became interested in complete "fix-free" codes, as defined in the abstract above. Preliminary investigation led us to conjecture that a complete fix-free code over a given t-letter alphabet Z must be of the form Z k for some integer k. Conjecture. A complete fix-free code over a given t-letter alphabet Z must be of the form E k for some integer k. That is, there are no variable-length complete fix-free codes. We attempted to prove this conjecture, but were surprised to find out that it is false. Here is the first counterexample we found, over a binary alphabet: A = {01,000, io0, ii0, iii, 0010, 0011, i010, i011} The "prefix tree" and "suffix tree" for the code A are given in Figure 1. * The first author was supported by NSF grant 9212184-CCR and Darpa contract N00014-92-J-1799. ** Supported by NSF grants CCR-8914428 and CCR-9310888, and the Siemens Corporation.

110 DAVID GILLMAN AND RONALD L. RIVEST o o I I (b) Figure 1. (a) Prefix tree and (b) suffix tree for fix-free code A = {01, 000,100, 110,111, 0010, 0011, i010, i011}. Any such counterexample to the conjecture generates a family of counterexamples by forming products: let A k be the set of concatenations of k words from A. Since A is fix-free, A k must also be fix-free. Given that the conjecture is false, it is natural to ask for general techniques for constructing fix-free codes, much as one can construct Huffman codes. However, the problem of constructing fix-free codes seems much more difficult, and we do not know how to generate all such codes. For use in applications, it would seem advantageous to have codes with arbitrarily large ratios between the lengths of the longest and shortest codewords. In the set A just described, the ratio of the lengths of the longest and shortest words is 2. In the next section we show that A is an example of a general construction that gives a ratio of longest word to shortest word approaching 3. We further generalize this by developing a recursive construction which gives arbitrarily large ratios. We leave as an open problem the question of constructing "efficient" fix-free codes, given a probability distribution on a source alphabet. 2. General Construction Techniques 2.1. First Construction Method Here is a general construction of a complete fix-free code A over a t-letter alphabet E. THEOREM 1 Let S be any subset of Ek, and p be a fixed positive integer less than k, such that for any two (not necessarily distinct) words ors (say, x = x[1]x[2]... x[k ] and

COMPLETE VARIABLE-LENGTH "FIX-FREE" CODES 1 1 1 y = y[1]y[2]... y{k]), we have that x[i] r y[i + p]for some i, 1 < i < k - p. (1) Then the set of codewords Fk.p(S ) = S U ZPS~ p U (E k+p -S~ p -EPS) (2) is both fix-free and complete. Proof Condition (1) ensures that no element of S can be a prefix or a suffix of any string in ]~PS~ p. It is immediate that no element of S is a prefix or a suffix of any string in (Nk+p _ SNp - ZPS), and that no element of (E ~+p - SNp - EPS) is a prefix or a suffix of any string in ~PS~ p. Thus Ft,p(S) is fix-free. To show that Fk,p (S) is complete, we verify that the Kraft inequality is tight. Condition (1) ensures that SZP and EPS are disjoint. Let s = ISI. Then Fk,p(S) contains s words of length k, st 2p words of length k + 2p, and t k+p - st p+1 words of length k + p. We thus have Z x~f~..p(s) t_lx I so that Fk,p is complete. = st -k + st2pt -k-2p + (t t+p - stp+l)t -k-p (3) = st -k + st -k + (1 - st -k+l) (4) = 1, (5) The ratio of the longest word to the shortest word in this fix-free code is (2p + k)/k; if we choose p = k - I (which we certainly can do for some S), the ratio is 3-2/k, which approaches 3 as k goes to infinity. It is possible to pick a set S satisfying condition (1) for any k and any p < k; for example, let S be a set containing a single element x for which xl ~ Xp+l. In our example from the previous section the set A arises by letting S = {01} and p = 1 (so that k = 2). [] 2.2. Second (Recursive) Construction Method We now generalize the above construction. We describe how to construct a complete fixfree code recursively and prove that the code is fix-free as long as there exists a number m and sets of codewords $1... Sr such that: 1. For i = 1, 2... r, we have that Si is a subset of Nk~, and kl <k2 <'"<kr <m. 2. The set S = $1 U.- 9 U Sr is fix-free, but not complete. 3. Ifj 7 ~ i + 1 then SjY, m-kj 71 ~m-k~s i = 0.

- kr_(i_l) 1 12 DAVID GILLMAN AND RONALD L. RIVEST 4. There is a sequence of strings Xl, x2..... x r such that xi ~ Si for 1 < i < r, and xi is a suffix of some string in xi+l E rn-k~+~ for 1 < i < r - 1. Remark. Conditions (1)-(4) are the most general we know of to ensure the validity of the construction that follows. The special case to keep in mind while reading the construction is the one in which Si = {xi } for every i. We illustrate this special case in the next section. We assume that SI... Sr satisfy conditions (1)-(4), and using them we construct a complete fix-free code Tr as follows. Step O. Let To denote S U (Em - SE*); thus To is the union of S with the set of all codewords of length m that do not begin with a word from S. Step i, for i = 1, 2... r. Let T/denote the set of strings in T/_], except that each string x of T/_I having a proper suffix in some Sj is replaced by xem-~j. (Each such word x is replaced by t m-kj words of length Ixl + m - kj.) Since S is suffix-free by condition (2) no string of T/_I has more than one such suffix, so this step is well-defined. It is easy to argue by induction that each T/is both prefix-free and complete. (In fact it is helpful to think of T/as a prefix tree, with certain branches being extended to create T/+I.) Therefore, it only remains to argue that T~ is suffix-free. Claim 1. Let0 < i < r. Ifx 6 Ti is a proper suffix of some string in T/ then x c S1 U 9 9 9 U Sr-i. In particular, Tr is fix-free. Proof. It will suffice to show that each new string created in step i contains no proper suffix from among (To U... U Tr) \ ($1 U... U Sr-i). We prove this by induction on i. The case i = 0 is obvious. Suppose a string x was created in Step i > 0 by appending some string in E m-kj. By induction we have j < r - (i - 1). By condition (3), ifx has a proper suffix y 6 S, then y must be an element of Sj-1, which is contained in $1 U... U Sr-i. We must now show that x has no proper suff from among (To U... U Tr) \ S. Suppose for purposes of contradiction that y 6 (To U... U Tr) \ S and y is a proper suffix of x. The last m letters of y have an element of Sj as a prefix; therefore y was created at some step i' > 0 by appending some string in Em-kj. Snip the last m - kj letters off x and y to get x' and y'. Then x' was created at step i - 1 and yl was created at step i' - 1 and yl is a suffix of x'. Also, y' r S, which contradicts the inductive hypothesis. [] Claim 2. The longest word in Tr has length m + ]~;=l(m - kj). Proof. We show by induction that for 0 < i < r - 1, the string Xr_ i given by condition (4) is a suffix of some string in T/of length m + I]~= 1 (m - kr-(j-1)). The case i = 0 follows from condition (3). Suppose i > 0. By induction Xr_(i_l) is a suffix of some string in T/_I of length m + ~-1 l (m - kr-(j-1)). The construction of T/ appends all strings of length m - to this string, and by condition (4) one of the new strings so created has Xr_ i as a suffix. This completes the induction.

COMPLETE VARIABLE-LENGTH "FIX-FREE" CODES l 13 ~]r-1 [in We have shown that xl is a suffix of some string of Tr-t of length m + )=1 t - kr-(j-1)). Now the construction of Tr appends all strings of length m - kl. This creates strings in Tr of length m + N~= 1 (m - kj). By a similar argument and condition (3) every string created S t after step 0 has length m + Ej=,(m - kj) for some 1 < s < s' < r. Therefore, the strings whose existence we just established are the longest in Tr. [] The construction in Theorem 1 is a special case of the construction of Tr given here, with r = 1, S = SI, k = kl, and m = k + p. Hypothesis (1) in Theorem 1 implies that condition (3) holds. 3. Example In this section we construct, for each whole number n, a binary fix-free code in which the longest word has length (5n 2 + 13n + 2)/2 and the shortest word has length 3n + 1. The ratio of longest word to shortest word is therefore greater than 5n/6. It is possible to improve the constants slightly with a little effort; that is, to attain a larger ratio for the same maximum length string. Let n _> I be an integer. Referring to the notation of the previous section, set m = 6n + 1 and r = n. For 1 < i < r, put ki = 3n + i and xi = 02i-llo 3n-i-ll. The collection of singleton sets Si = {xi }, i = t... r, clearly satisfies conditions (1)-(4). The construction of the fix-free code proceeds as in the previous section. 4. Further Work There are fix-free codes unaccounted for by the construction of this paper. (We have written a C program to search for such codes; this program and its output are available from the authors.) We do not know how common fix-free codes are among the complete prefix-free codes. It would be interesting to find an upper bound on the average codeword length of an optimal fix-free encoding of an arbitrary n-letter source alphabet; for example, some constant multiple of the entropy of the source alphabet. Acknowledgment We would like to thank Peter Elias and Mojdeh Mohtashemi for helpful discussions. Notes 1. The receiver first parses the transmission into codewords and then decodes each one. Because of the suffix-free property, the receiver will parse incorrectly from the dropped bit onward, and the end of a parsed word will never coincide with the end of a source word. This is what we mean by failure to resynchronize.

114 DAVID GILLMAN AND RONALD L. RIVEST References 1. David A. Huffman, A method for the construction of minimum-redundancy codes, Proceedings of the IRE, Vol. 40, No. 9 (1952) pp. 1098-1101.