Chapter 5 Lempel-Ziv Codes To set the stage for Lempel-Ziv codes, suppose we wish to nd the best block code for compressing a datavector X. Then we ha

Size: px
Start display at page:

Download "Chapter 5 Lempel-Ziv Codes To set the stage for Lempel-Ziv codes, suppose we wish to nd the best block code for compressing a datavector X. Then we ha"

Transcription

1 Chapter 5 Lempel-Ziv Codes To set the stage for Lempel-Ziv codes, suppose we wish to nd the best block code for compressing a datavector X. Then we have to take into account the complexity of the code. We could represent the total number of codebits at the decoder output as: [# of codebits to describe block code] + [# of codebits from using code on X] The codebits used to describe the block code that is chosen to compress X form a prex of the encoder output and constitute what is called the overhead of the encoding procedure. If we wish to choose the best block code for compressing X, from among block codes of all orders, we would choose the block code in order to minimize the total of the overhead codebits and the encoded datavector codebits. One could also adopt this approach to code design in order to choose the best nite memory code for compressing X, or, more generally, the best nite-state code. EXAMPLE 1. Suppose we wish to compress English text using nite memory codes. A nite memory code of order zero entails 51 bits of overhead. (Represent the Kraft vector used as a binary tree with 26 terminal nodes and 2 26, 1 = 51 nodes all together. You have to let the decoder know how to grow this tree it takes one bit of information at each of the 51 nodes to do that, since the decoder will either growtwo branches at each node, or none.) A nite memory rst order code for English text will entail = 1377 bits of overhead. (You need a codebook of 27 dierent codes, with 51 bits to describe each code.) A nite memory second order code for English text can be described with = bits of overhead. (There are = 677 codes in the codebook, in this case.) You would keep increasing the order of your nite memory code until you nd the order for which you have minimized the sum of the amount ofoverhead plus the length of the encoded English text via the best nite memory code of that order. It would be nice to have a compression technique that entails no overhead, while performing at least as well as the block codes, or the nite memory codes, or the nite-state codes (provided the length of the datavector is long enough). Overhead is caused because statistics of the datavector (consisting of various frequency counts) are collected rst and then used to choose the code. Since the code arrived at depends on these statistics, overhead is needed to describe the code. Suppose instead that information about the datavector is collected \on the y" as you encode the samples in the datavector from left to right in encoding the current sample (or group of samples), you could use information collected about the previously encoded samples. A code which operates in this way might not need any overhead to describe it. Codes like this which require no overhead at the decoder output are called adaptive codes. The Lempel-Ziv code, the subject of this chapter, will be our rst example of an adaptive code. There are quite a number of variants of the Lempel-Ziv code. The variant we shall describe in this chapter is called LZ78, after the date of the paper [1]. 5.1 Lempel-Ziv Parsing In block coding, you rst partition the datavector into blocks of equal length. In Lempel-Ziv coding, you start by partitioning the datavector into variable-length blocks instead. The procedure via which this partitioning 5{1

2 takes place is called Lempel-Ziv parsing. The rst variable-length block arising from the Lempel-Ziv parsing of the datavector X = (X 1 ;X 2 ;:::;X n ) is the single sample X 1. The second block in the parsing is the shortest prex of (X 2 ;:::;X n ) which is not equal to X 1. Suppose this second block is(x 2 ;:::;X j ). Then, the third block in Lempel-Ziv parsing will be the shortest prex of (X j+1 ;:::;X n ) which is not equal to either X 1 or (X 2 ;:::;X j ). In general, suppose the Lempel-Ziv parsing procedure has produced the rst k variable-length blocks B 1 ;B 2 ;:::;B k in the parsing, and X (k) is that part left of X after B 1 ;B 2 ;:::;B k have been removed. Then the next block B k+1 in the parsing is the shortest prex of X (k) which is not equal to any of the preceding blocks B 1 ;B 2 ;:::;B k. (If there is no such block, then B k+1 = X (k) and the Lempel-Ziv parsing procedure terminates.) By construction, the sequence of variable-length blocks B 1 ;B 2 ;:::;B t produced by the Lempel-Ziv parsing of X are distinct, except that the last block B t could be equal to one of the preceding ones. EXAMPLE 2. The Lempel-Ziv parsing of X =(1; 1; 0; 1; 1; 0; 0; 0; 1; 1; 0; 1) is B 1 =1;B 2 =10;B 3 =11;B 4 =0;B 5 =00;B 6 = 110;B 7 =1 (1) This parsing can also be accomplished via MATLAB. Here are the results of a MATLAB session that the reader can try: x=[ ]; y=lzparse(x); print_bitstrings(y) The MATLAB function LZparse(x) (the m-le of which is given in Section 5.6) gives the indices of the variable-length blocks in the Lempel-Ziv parsing of the datavector x. Using the MATLAB function print bitstrings, wewere able to print out the blocks in the parsing on the screen. 5.2 Lempel-Ziv Encoder We suppose that the alphabet from which our datavector X =(X 1 ;X 2 ;:::;X n ) is formed is A = f0; 1;:::;k, 1g, where k is a positive integer. After obtaining the Lempel-Ziv parsing B 1 ;B 2 ;:::;B t of X, the next step is to represent each block in the parsing as a pair of integers. The rst block in the parsing, B 1, consists of a single symbol. It is represented as the pair (0;B 1 ). More generally, any block B j of length one is represented as the pair (0;B j ). If the block B j is of length greater than one, then it is represented as the pair (i; s), where s is the last symbol in B j and B i is the block in the parsing which coincides with the block obtained by removing s from the end of B j. (By construction of the Lempel-Ziv parsing, there will always be such a block B i.) EXAMPLE 3. The sequence of pairs corresponding to the parsing (1) is (0; 1); (1; 0); (1; 1); (0; 0); (4; 0); (3; 0); (0; 1) (2) For example, (4; 0) corresponds to the block 00 in the parsing. Since the last symbol of 00 is 0, the pair (4; 0) ends in 0. The 4 in the rst entry refers to the fact that B 4 = 0 is the preceding block in the parsing which is equal to what we get by deleting the last symbol of 00. For our next step, we replace each pair (i; s) by the integer ki+s. Thus, the sequence of pairs (2) becomes the sequence of integers 5{2

3 2 0+1=1; 2 1+0=2; 2 1+1=3; 2 0+0=0; 2 4+0=8; 2 3+0=6; 2 0+1=1 (3) To nish our description of the encoding process in Lempel-Ziv coding, let I 1 ;I 2 ;:::;I t denote the integers corresponding to the blocks B 1 ;B 2 ;:::;B t in the Lempel-Ziv parsing of the datavector X. Each integer I j is expanded to base two and these binary expansions are \padded" with zeroes on the left so that the overall length of the string of bits assigned to I j is dlog 2 (kj)e. The reason why this many bits is necessary and sucient is seen by examining the largest that I j can possibly be. Let (i; s) be the pair associated with I j. Then the biggest that i can be is j, 1 and the biggest that s can be is k, 1. Thus the biggest that I j can be is k (j, 1) + k, 1=kj, 1, and the number of bits in the binary expansion of kj, 1isdlog 2 (kj)e. Let W j be the string of bits of length dlog 2 (kj)e assigned to I j as described in the preceding. Then, the Lempel-Ziv encoder output is obtained by concatenating together the strings W 1 ;W 2 ;:::;W t. To illustrate, suppose a binary datavector has seven blocks B 1 ;B 2 ;:::;B 7 in its Lempel-Ziv parsing (such as in Example 2). These blocks are assigned, respectively, strings of codebits W 1 ;W 2 ;W 3 ;W 4 ;W 5 ;W 6 ;W 7 of lengths dlog 2 (2)e = 1 bits, dlog 2 (4)e = 2 bits, dlog 2 (6)e = 3 bits, dlog 2 (8)e = 3 bits, dlog 2 (10)e = 4 bits, dlog 2 (12)e = 4 bits, and dlog 2 (14)e = 4 bits. Therefore, any binary data vector with seven blocks in its Lempel-Ziv parsing would result in an encoder output of length = 21 codebits. In particular, for the datavector in Example 2, the seven strings W 1 ;:::;W 7 are (referring to (3)): W 1 = 1 W 2 = 10 W 3 = 011 W 4 = 000 W 5 = 1000 W 6 = 0110 W 7 = 0001 Concatenating, we see the encoder output from the Lempel-Ziv coding of the datavector in Example 1 is Lempel-Ziv Decoder Suppose a datavector X with alphabet f0; 1; 2g was Lempel-Ziv encoded and the encoder output is: (4) Let us decode to get X. For an alphabet of size three, dlog 2 (3j)e codebits are allocated to the j-th block in the Lempel-Ziv parsing. This gives us the following table of codebit allocations: codebit allocation table parsing block number # of codebits Partitioning up the encoder output (4) according to the allocations in the above table, we obtain the partition: 00; 100; 0010; 1010; 1011; 00001; {3

4 Converting these to integer form we get: 0; 4; 2; 10; 11; 1; 0 Dividing each of these integers by three and recording quotient and remainder in each case, we get the pairs (0; 0); (1; 1); (0; 2); (3; 1); (3; 2); (0; 1); (0; 0) Working backward from these pairs we obtain the Lempel-Ziv parsing and the datavector 0; 01; 2; 21; 22; 1; Lempel-Ziv Parsing Tree X =(0; 0; 1; 2; 2; 1; 2; 2; 1; 0) In some implementations of Lempel-Ziv coding, both encoder and decoder grow from scratch a tree called the Lempel-Ziv parsing tree. Here is the Lempel-Ziv parsing tree for the datavector in Example 2: Figure 1: Lempel-Ziv Parsing Tree for Example We explain to the reader the meaning of this tree. Label each left branch with a \1" and each right branch with a \0". For each node i (i = 1;:::;6) write down the variable-length block consisting of the bits encountered along the path from the root node (labelled 0) to node i this block B i is then the i-th block in the Lempel-Ziv parsing of the datavector. For example, if we follow the path from node 0 to node 6, we see a left branch, a left branch, and a right branch, which converts to the block 110. Thus, the sixth block in the Lempel-Ziv parsing of our datavector is 110. Let the datavector be X =(X 1 ;X 2 ;:::;X n ). The encoder grows the Lempel-Ziv parsing tree as follows. Suppose there are q distinct blocks in the Lempel-Ziv parsing, B 1 ;B 2 ;:::;B q. Then the encoder grows trees T 1 ;T 2 ;:::;T q. Tree T 1 consists of node 0, node 1, and a single branch going from node 0 to node 1 that is labelled with the symbol B 1 = X 1. For each i>1, tree T i is determined from tree T i,1 as follows: (a) Remove B 1 ;:::;B i,1 from the beginning of X and let the resulting datavector be called X (i). (b) Starting at the root node of T i,1, follow the path driven by X (i) until a terminal node of T i,1 is reached (the labels on the resulting path form a prex of X (i) which is one of the blocks B j 2fB 1 ;B 2 ;:::;B i,1 g, and the terminal node reached is labelled j). (c) Let X be the next symbol in X (i) to appear after B j. Grow a branch from node j of T i,1, label this branch with the symbol X, and label the new node at the end of this branch as \node i". This new tree is T i. The decoder can also grow the Lempel-Ziv parsing tree as decoding of the compressed datavector proceeds from left to right. We will leave it to the reader to see how that is done. Growing a Lempel-Ziv parsing tree allows the encoding and decoding operations in Lempel-Ziv coding to be done in a fast manner. Also, there are modications of Lempel-Ziv coding (not to be discussed here) in which enhancements in data compression are obtained by making use of the structure of the parsing tree. 5{4

5 5.5 Redundancy of LZ78 We want to see how much better the Lempel-Ziv code is than the block codes of various orders. We shall do this by comparing the Lempel-Ziv codeword length for the datavector to the blockentropies of the datavector introduced in Chapter 4. It makes sense to make this comparison because the block entropies tell us how well the best block codes do. The simplest case, which we discuss rst, would be to compare Lempel-Ziv code performance to the rst order entropy. Let X =(X 1 ;X 2 ;:::;X n ) denote the datavector to be compressed, and let LZ(X) denote the length of the codeword assigned by the Lempel-Ziv code to X. Comparing LZ(X) to the rst order entropy H 1 (X), one can derive a bound of the form LZ(X) nh 1 (X)+n n (5) The constant term n, which depends only on the datavector length n, is called the rst order redundancy and its units are bits per data sample. The better a data compression algorithm is, the smaller the redundancy will be. The following result gives the rst order redundancy for the Lempel-Ziv code. RESULT. The rst order redundancy n = C log 2 log 2 n log 2 n is achievable for the Lempel-Ziv code, where C is a positive constant that depends upon the size of the data alphabet. (In the preceding, we assume that the datavector length n is at least three, so that the redundancy will be well-dened.) INTERPRETATION. We introduce some notation which makes it more convenient to talk about redundancy. If fz n g is a sequence of real numbers, and f n g is a sequence of real numbers, we say that z n is O( n ) if there is a positive constant D such that z n Dj n j for all suciently large positive integers n. Using our new notation, we see that the above RESULT says that the rst order redundancy of the Lempel-Ziv code is O(log 2 log 2 n= log 2 n) (where n denotes the length of the datavector). What does our redundancy result say? Recall that H 1 (X) isalower bound on the compression rate that results when one compresses X using the best memoryless code that can be designed for X. Thus, the RESULT tells us that the Lempel-Ziv code yields a compression rate on any datavector of length n no worse than log 2 log 2 n= log 2 n bits per sample more than the compression rate for the best memoryless code for the datavector. Since the quantity log 2 log 2 n= log 2 n is very small when n is large, we can achieve through Lempel-Ziv coding a compression performance approximately no worse than that achievable by the best memoryless code for the given datavector. To show that the RESULT is true, we need the notion of unnormalized entropy. Let (Y 1 ;Y 2 ;:::;Y m )be a datavector. (We allow the case in which each entry Y i is itself a datavector; for example, the Y i 's may be blocks arising from a Lempel-Ziv parsing.) The unnormalized entropy H (Y 1 ;:::;Y m ) of the datavector (Y 1 ;:::;Y m ) is dened to be m, the length of the datavector, times the rst order entropy H 1 (Y 1 ;:::;Y m ) of the datavector. This gives us the formula H (Y 1 ;:::;Y m )= mx (6), log 2 p(y i ) (7) where p is the probability distribution on the set of entries of the datavector which assigns to each entry Y the probability p(y ) dened by p(y ) =#f1 i m : Y i = Y g=m (In other words, p is the rst-order empirical distribution for the datavector (Y 1 ;:::;Y m ).) In this argument, x an arbitrary datavector X =(X 1 ;X 2 ;:::;X n ). Let (B 1 ;B 2 ;:::;B t ) be the Lempel- Ziv parsing of the datavector. From Exercise 4 at the end of this chapter, we have the following inequality: H (B 1 ;B 2 ;:::;B t ) H (X 1 ;:::;X n )+H (jb 1 j; jb 2 j;:::;jb t j) 5{5

6 where jb i j denotes the length of the block B i. Since the blocks B 1 ;B 2 ;:::;B t,1 are distinct, We know that (t, 1) log 2 (t, 1) = H (B 1 ;:::;B t,1 ) H (B 1 ;:::;B t ) LZ(X) = tx dlog 2 (ki)e where k is the size of the data alphabet. Expanding out the right side of the preceding equation, one can see that there is a constant c 1 such that LZ(X) (c 1 + k)t +(t, 1) log 2 (t, 1) for all datavectors X. From Exercise 6 at the end of the chapter, By concavity of the logarithm function, and so where H (jb 1 j;:::;jb t j) log 2 (1 + log e n)+ tx log 2 jb i jt log 2 (t,1 tx LZ(X)=n H 1 (X)+(X) tx log 2 jb i j jb i j)=t log 2 (n=t) (X) =(c 1 + k)(t=n)+n,1 log 2 (1 + log e n)+ By Exercise 8 at the end of the chapter, there is a constant c 2 such that log2 (n=t) (n=t) (8) t log 2 n c 2 n Applying this to the rst and third terms on the right side of (8), it is seen that 1 log2 log (X) =O + O 2 n log2 log + O 2 n log 2 n n log 2 n Of the three terms on the right above, the third term is dominant. We have achieved the bound (5) with n given by (6). The RESULT is proved. We now want to compare the compression performance of the Lempel-Ziv code to the performance of block codes of an arbitrary order j. Consider an arbitrary datavector X = (X 1 ;:::;X n ) of length n a multiple of j. By a complicated argument similar the argument given above for j = 1 (which we omit), it can be shown that there is a constant C j such that LZ(X)=n H j (X)+C j log2 log 2 n log 2 n The second term on the right above is the j-th order redundancy of the Lempel-Ziv code. In other words, the relation (9) tells us that for any j, the j-th order redundancy of the Lempel-Ziv code is O(log 2 log 2 n= log 2 n), which becomes very small as n gets large. Recall from Chapter 4 that H j (X) is a lower bound on the compression rate of the best j-th order block code for X. We conclude that no matter how large the order of the block code that one attempts to use, the Lempel Ziv algorithm will yield a compression rate on an arbitrary datavector approximately no worse than that of the block code, provided the datavector is long enough relative to the order of the block code. Hence, one loses nothing in compression rate by using the Lempel-Ziv code instead of a block code. Also, one is able to compress a datavector faster via the Lempel- Ziv code than via block coding. To see this, one need only look at memoryless codes. For a datavector of (9) 5{6

7 length n, the overall time for best compression of the datavector via a memoryless code is proportional to n 2. (The overall compression time in this case would be the time it takes to design the Human code for the datavector plus the time it takes to compress the datavector with the Human code; since the rst time is proportional to n 2 and the second time is proportional to n, the overall compression time is proportional to n 2.) On the other hand, if the Lempel-Ziv code is implemented properly, it will take time proportional to n to compress any datavector of length n. (No time is wasted on design; the Lempel-Ziv code structure is the same for every datavector.) We conclude: Lempel-Ziv coding yields a compression performance as good as or better than the best block codes (provided the datavector is long enough). Lempel-Ziv coding yields faster compression of the data than does coding via the best block codes, because no time is wasted on design. The Lempel-Ziv code has been our rst example of a code which does at least as well as the block codes in terms of the redundancy of all orders becoming small with large datavector length. Such codes are called universal codes. Although the Lempel-Ziv code is a universal code, there are universal codes whose redundancy goes to zero faster with increasing datavector length than does the redundancy of the Lempel-Ziv code. This point is discussed further in Chapter MATLAB m-les We present two MATLAB programs in connection with Lempel-Ziv coding: LZparse LZcodelength LZparse.m Here is the m-le for the MATLAB function LZparse: %This m-file is called LZparse.m %It accomplishes Lempel-Ziv parsing of a binary %datavector %x is a binary datavector %y = LZparse(x) is a vector consisting of the indices %of the blocks in the Lempel-Ziv parsing of x % function y = LZparse(x) N=length(x); dict=[]; lengthdict=0; while lengthdict < N i=lengthdict+1; k=0; while k==0 v=x(lengthdict+1:i); j=bitstring_to_index(v); A=(dict~=j); k=prod(a); if i==n k=1; else end 5{7

8 i=i+1; end dict=[dict j]; lengthdict=lengthdict + length(v); end y=dict; The function \LZparse" was illustrated in Example LZcodelength.m Here is the m-le for the MATLAB function LZcodelength: %This m-file is named LZcodelength.m %x = a binary datavector %LZcodelength(x) = length in codebits of the encoder %output resulting from the Lempel-Ziv coding of x % function y = LZcodelength(x) u=lzparse(x); t=length(u); S=0; for :t; S=S+ceil(log2(2*i)); end y=s; To illustrate the MATLAB function LZcodelength, we performed the following MATLAB session: x=[ ]; LZcodelength(x) 21 As a result of this session, we computed the length of the codeword resulting from the Lempel-Ziv encoding of the datavector in Example 2, and \21" was printed out on the screen. This is the correct length of this codeword, as computed earlier in these notes. 5.7 Exercises 1. What is the minimum number of variable-length blocks that can appear in the Lempel-Ziv parsing of a binary datavector of length 28? What is the maximum number? 2. Find the binary codeword that results when the datavector is encoded using the Lempel-Ziv code. 3. The alphabet of a datavector is f0; 1; 2g. The codeword results when the datavector is Lempel-Ziv encoded. Find the datavector. 4. Let X =(X 1 ;X 2 ;:::;X n ) be a datavector and let B 1 ;B 2 ;:::;B t be variable-length blocks into which X is partitioned (from left to right). Show that H (B 1 ;B 2 ;:::;B t ) H (X)+H (jb 1 j; jb 2 j;:::;jb t j) (10) where jb i j is the length of B i. (Inequality (10) can be proved by grouping appropriately the terms that appear in the summation giving the unnormalized entropy H (B 1 ;B 2 ;:::;B t ); see formula (7).) 5{8

9 5. Let (X 1 ;X 2 ;:::;X n ) be a datavector and let A be the data alphabet. Show that H (X 1 ;X 2 ;:::;X n ) nx, log 2 p(x i ) for every probability distribution p on A. (Hint: Use the fact that X a2a p 1 (a) log 2 p1 (a) p 2 (a) for any two probability distributions p 1 ;p 2 on A; see Exercise 1 of Chapter 3.) 6. Consider a datavector (X 1 ;X 2 ;:::;X n ) in which each sample X i is a positive integer less than or equal to N. Show that 0 H (X 1 ;X 2 ;:::;X n ) log 2 (1 + log e N)+ (Hint: First, use the result of Exercise 5 with the probability distribution Then use the inequality p(j) = nx log 2 X i 1=j 1+(1=2)+(1=3) + :::+(1=N) ; j =1;:::;N (1=2) + (1=3) + :::+(1=N) Z N 1 (1=x)dx = log e N which can be seen by approximating the area under the curve y =1=x by a sum of areas of rectangles.) 7. Let A be an arbitrary nite alphabet. Dene L lz (n) to be the minimum Lempel-Ziv codeword length assigned to the datavectors of length n over the alphabet A. Show that log lim 2 L lz (n) =1=2 n!1 log 2 n This property points out a hidden defect of the Lempel-Ziv code. Because the limit on the left above is greater than zero, there exist certain datavectors which the Lempel-Ziv code does not compress very well. 8. Consider all datavectors of all lengths over a xed nite alphabet A. If X is such a datavector, let t(x) denote the number ofvariable-length blocks that appear in the Lempel-Ziv parsing of X. Show that there is a constant M (depending on the size of the alphabet A), such that for any integer n 2, and any datavector X of length n, t(x) Mn log 2 n (Hint: Let t = t(x) and let B 1 ;B 2 ;:::;B t,1 be the rst t, 1variable-length blocks in the Lempel-Ziv parsing of X. Let jb i j denote the length of block B i. In the inequality jb 1 j + jb 2 j + :::+ jb t,1 jn nd a lower bound for the left hand side using the fact that the B i 's are distinct.) 9. We discuss a variant of the Lempel-Ziv code which yields shorter codewords for some datavectors than does LZ78. Encoding is accomplished via three steps. In Step 1, we partition the datavector (X 1 ;:::;X n ) into variable-length blocks in which the rst block is of length one, and each succeeding block (except 5{9

10 for possibly the last block) is the shortest prex of the rest of the datavector which is not windowed in the datavector as we slide to the left. To illustrate, the datavector is partitioned into 0; 001; 10 (11) in Step 1. (On the other hand, LZ78 partitions this datavector into four blocks instead of three: 0; 00; 1; 10.) In Step 2, each block B in the sequence of blocks from Step 1 is represented as a triple (i; j; k) in which k is the last symbol in B, i is the length of the block B, and j is the smallest integer such that if welookatthei,1samples in the datavector starting with sample X j,wewill see windowed the block obtained by removing the last symbol from B. (Take j =0ifB has length one.) For example, for the blocks in (11), Step 2 gives us the triples (1; 0; 0); (3; 1; 1); (2; 4; 0) In Step 3, the sequence of triples from Step 2 is converted into a binary codeword. There is a clever way to do this which we shall not discuss here. All we need to know for the purposes of this exercise is that if there are t triples and the datavector length is n, then the approximate length of the binary codeword is t log 2 n. (a) Show that there are innitely many binary datavectors such that Step 1 yields a partition of the datavector into 5 blocks. (b) Let X (n) be the datavector consisting of n zeroes. Let LZ (X (n) ) be the length of the binary codeword which results when X (n) is encoded using the variant ofthe Lempel-Ziv code. Show that LZ (X (n) )=LZ(X (n) ) converges to zero as n!1. 5{10

11 References [1] J. Ziv and A. Lempel, \Compression of individual sequences via variable-rate coding," IEEE Trans. Inform. Theory, vol. 24, pp. 530{536, {11

16 Greedy Algorithms

16 Greedy Algorithms 16 Greedy Algorithms Optimization algorithms typically go through a sequence of steps, with a set of choices at each For many optimization problems, using dynamic programming to determine the best choices

More information

UC San Diego UC San Diego Previously Published Works

UC San Diego UC San Diego Previously Published Works UC San Diego UC San Diego Previously Published Works Title Universal lossless compression via multilevel pattern matching Permalink https://escholarshiporg/uc/item/39k54514 Journal IEEE Transactions on

More information

V Advanced Data Structures

V Advanced Data Structures V Advanced Data Structures B-Trees Fibonacci Heaps 18 B-Trees B-trees are similar to RBTs, but they are better at minimizing disk I/O operations Many database systems use B-trees, or variants of them,

More information

Data Compression - Seminar 4

Data Compression - Seminar 4 Data Compression - Seminar 4 October 29, 2013 Problem 1 (Uniquely decodable and instantaneous codes) Let L = p i l 100 i be the expected value of the 100th power of the word lengths associated with an

More information

An On-line Variable Length Binary. Institute for Systems Research and. Institute for Advanced Computer Studies. University of Maryland

An On-line Variable Length Binary. Institute for Systems Research and. Institute for Advanced Computer Studies. University of Maryland An On-line Variable Length inary Encoding Tinku Acharya Joseph F. Ja Ja Institute for Systems Research and Institute for Advanced Computer Studies University of Maryland College Park, MD 242 facharya,

More information

V Advanced Data Structures

V Advanced Data Structures V Advanced Data Structures B-Trees Fibonacci Heaps 18 B-Trees B-trees are similar to RBTs, but they are better at minimizing disk I/O operations Many database systems use B-trees, or variants of them,

More information

Information Theory and Coding Prof. S. N. Merchant Department of Electrical Engineering Indian Institute of Technology, Bombay

Information Theory and Coding Prof. S. N. Merchant Department of Electrical Engineering Indian Institute of Technology, Bombay Information Theory and Coding Prof. S. N. Merchant Department of Electrical Engineering Indian Institute of Technology, Bombay Lecture - 11 Coding Strategies and Introduction to Huffman Coding The Fundamental

More information

Chapter 5 VARIABLE-LENGTH CODING Information Theory Results (II)

Chapter 5 VARIABLE-LENGTH CODING Information Theory Results (II) Chapter 5 VARIABLE-LENGTH CODING ---- Information Theory Results (II) 1 Some Fundamental Results Coding an Information Source Consider an information source, represented by a source alphabet S. S = { s,

More information

ITCT Lecture 8.2: Dictionary Codes and Lempel-Ziv Coding

ITCT Lecture 8.2: Dictionary Codes and Lempel-Ziv Coding ITCT Lecture 8.2: Dictionary Codes and Lempel-Ziv Coding Huffman codes require us to have a fairly reasonable idea of how source symbol probabilities are distributed. There are a number of applications

More information

Greedy Algorithms CHAPTER 16

Greedy Algorithms CHAPTER 16 CHAPTER 16 Greedy Algorithms In dynamic programming, the optimal solution is described in a recursive manner, and then is computed ``bottom up''. Dynamic programming is a powerful technique, but it often

More information

Information Theory and Communication

Information Theory and Communication Information Theory and Communication Shannon-Fano-Elias Code and Arithmetic Codes Ritwik Banerjee rbanerjee@cs.stonybrook.edu c Ritwik Banerjee Information Theory and Communication 1/12 Roadmap Examples

More information

Text Compression through Huffman Coding. Terminology

Text Compression through Huffman Coding. Terminology Text Compression through Huffman Coding Huffman codes represent a very effective technique for compressing data; they usually produce savings between 20% 90% Preliminary example We are given a 100,000-character

More information

Worst-case running time for RANDOMIZED-SELECT

Worst-case running time for RANDOMIZED-SELECT Worst-case running time for RANDOMIZED-SELECT is ), even to nd the minimum The algorithm has a linear expected running time, though, and because it is randomized, no particular input elicits the worst-case

More information

Entropy Coding. - to shorten the average code length by assigning shorter codes to more probable symbols => Morse-, Huffman-, Arithmetic Code

Entropy Coding. - to shorten the average code length by assigning shorter codes to more probable symbols => Morse-, Huffman-, Arithmetic Code Entropy Coding } different probabilities for the appearing of single symbols are used - to shorten the average code length by assigning shorter codes to more probable symbols => Morse-, Huffman-, Arithmetic

More information

Lecture 2 - Graph Theory Fundamentals - Reachability and Exploration 1

Lecture 2 - Graph Theory Fundamentals - Reachability and Exploration 1 CME 305: Discrete Mathematics and Algorithms Instructor: Professor Aaron Sidford (sidford@stanford.edu) January 11, 2018 Lecture 2 - Graph Theory Fundamentals - Reachability and Exploration 1 In this lecture

More information

Data Compression Techniques

Data Compression Techniques Data Compression Techniques Part 2: Text Compression Lecture 6: Dictionary Compression Juha Kärkkäinen 15.11.2017 1 / 17 Dictionary Compression The compression techniques we have seen so far replace individual

More information

Binomial Coefficient Identities and Encoding/Decoding

Binomial Coefficient Identities and Encoding/Decoding Binomial Coefficient Identities and Encoding/Decoding CSE21 Winter 2017, Day 18 (B00), Day 12 (A00) February 24, 2017 http://vlsicad.ucsd.edu/courses/cse21-w17 MT2 Review Sessions Today and Tomorrow! TODAY

More information

Move-to-front algorithm

Move-to-front algorithm Up to now, we have looked at codes for a set of symbols in an alphabet. We have also looked at the specific case that the alphabet is a set of integers. We will now study a few compression techniques in

More information

MCS-375: Algorithms: Analysis and Design Handout #G2 San Skulrattanakulchai Gustavus Adolphus College Oct 21, Huffman Codes

MCS-375: Algorithms: Analysis and Design Handout #G2 San Skulrattanakulchai Gustavus Adolphus College Oct 21, Huffman Codes MCS-375: Algorithms: Analysis and Design Handout #G2 San Skulrattanakulchai Gustavus Adolphus College Oct 21, 2016 Huffman Codes CLRS: Ch 16.3 Ziv-Lempel is the most popular compression algorithm today.

More information

CSC 310, Fall 2011 Solutions to Theory Assignment #1

CSC 310, Fall 2011 Solutions to Theory Assignment #1 CSC 310, Fall 2011 Solutions to Theory Assignment #1 Question 1 (15 marks): Consider a source with an alphabet of three symbols, a 1,a 2,a 3, with probabilities p 1,p 2,p 3. Suppose we use a code in which

More information

ECE 333: Introduction to Communication Networks Fall Lecture 6: Data Link Layer II

ECE 333: Introduction to Communication Networks Fall Lecture 6: Data Link Layer II ECE 333: Introduction to Communication Networks Fall 00 Lecture 6: Data Link Layer II Error Correction/Detection 1 Notes In Lectures 3 and 4, we studied various impairments that can occur at the physical

More information

Digital Communication Prof. Bikash Kumar Dey Department of Electrical Engineering Indian Institute of Technology, Bombay

Digital Communication Prof. Bikash Kumar Dey Department of Electrical Engineering Indian Institute of Technology, Bombay Digital Communication Prof. Bikash Kumar Dey Department of Electrical Engineering Indian Institute of Technology, Bombay Lecture - 29 Source Coding (Part-4) We have already had 3 classes on source coding

More information

Exercise set #2 (29 pts)

Exercise set #2 (29 pts) (29 pts) The deadline for handing in your solutions is Nov 16th 2015 07:00. Return your solutions (one.pdf le and one.zip le containing Python code) via e- mail to Becs-114.4150@aalto.fi. Additionally,

More information

1 Maximum Independent Set

1 Maximum Independent Set CS 408 Embeddings and MIS Abhiram Ranade In this lecture we will see another application of graph embedding. We will see that certain problems (e.g. maximum independent set, MIS) can be solved fast for

More information

COMPSCI 650 Applied Information Theory Feb 2, Lecture 5. Recall the example of Huffman Coding on a binary string from last class:

COMPSCI 650 Applied Information Theory Feb 2, Lecture 5. Recall the example of Huffman Coding on a binary string from last class: COMPSCI 650 Applied Information Theory Feb, 016 Lecture 5 Instructor: Arya Mazumdar Scribe: Larkin Flodin, John Lalor 1 Huffman Coding 1.1 Last Class s Example Recall the example of Huffman Coding on a

More information

STA 4273H: Statistical Machine Learning

STA 4273H: Statistical Machine Learning STA 4273H: Statistical Machine Learning Russ Salakhutdinov Department of Statistics! rsalakhu@utstat.toronto.edu! http://www.utstat.utoronto.ca/~rsalakhu/ Sidney Smith Hall, Room 6002 Lecture 5 Inference

More information

10.3 Recursive Programming in Datalog. While relational algebra can express many useful operations on relations, there

10.3 Recursive Programming in Datalog. While relational algebra can express many useful operations on relations, there 1 10.3 Recursive Programming in Datalog While relational algebra can express many useful operations on relations, there are some computations that cannot be written as an expression of relational algebra.

More information

Heap-on-Top Priority Queues. March Abstract. We introduce the heap-on-top (hot) priority queue data structure that combines the

Heap-on-Top Priority Queues. March Abstract. We introduce the heap-on-top (hot) priority queue data structure that combines the Heap-on-Top Priority Queues Boris V. Cherkassky Central Economics and Mathematics Institute Krasikova St. 32 117418, Moscow, Russia cher@cemi.msk.su Andrew V. Goldberg NEC Research Institute 4 Independence

More information

6. Finding Efficient Compressions; Huffman and Hu-Tucker

6. Finding Efficient Compressions; Huffman and Hu-Tucker 6. Finding Efficient Compressions; Huffman and Hu-Tucker We now address the question: how do we find a code that uses the frequency information about k length patterns efficiently to shorten our message?

More information

Unconstrained Optimization

Unconstrained Optimization Unconstrained Optimization Joshua Wilde, revised by Isabel Tecu, Takeshi Suzuki and María José Boccardi August 13, 2013 1 Denitions Economics is a science of optima We maximize utility functions, minimize

More information

SIGNAL COMPRESSION Lecture Lempel-Ziv Coding

SIGNAL COMPRESSION Lecture Lempel-Ziv Coding SIGNAL COMPRESSION Lecture 5 11.9.2007 Lempel-Ziv Coding Dictionary methods Ziv-Lempel 77 The gzip variant of Ziv-Lempel 77 Ziv-Lempel 78 The LZW variant of Ziv-Lempel 78 Asymptotic optimality of Ziv-Lempel

More information

Enumeration of Full Graphs: Onset of the Asymptotic Region. Department of Mathematics. Massachusetts Institute of Technology. Cambridge, MA 02139

Enumeration of Full Graphs: Onset of the Asymptotic Region. Department of Mathematics. Massachusetts Institute of Technology. Cambridge, MA 02139 Enumeration of Full Graphs: Onset of the Asymptotic Region L. J. Cowen D. J. Kleitman y F. Lasaga D. E. Sussman Department of Mathematics Massachusetts Institute of Technology Cambridge, MA 02139 Abstract

More information

Dictionary techniques

Dictionary techniques Dictionary techniques The final concept that we will mention in this chapter is about dictionary techniques. Many modern compression algorithms rely on the modified versions of various dictionary techniques.

More information

Scribe: Virginia Williams, Sam Kim (2016), Mary Wootters (2017) Date: May 22, 2017

Scribe: Virginia Williams, Sam Kim (2016), Mary Wootters (2017) Date: May 22, 2017 CS6 Lecture 4 Greedy Algorithms Scribe: Virginia Williams, Sam Kim (26), Mary Wootters (27) Date: May 22, 27 Greedy Algorithms Suppose we want to solve a problem, and we re able to come up with some recursive

More information

Lecture 17. Lower bound for variable-length source codes with error. Coding a sequence of symbols: Rates and scheme (Arithmetic code)

Lecture 17. Lower bound for variable-length source codes with error. Coding a sequence of symbols: Rates and scheme (Arithmetic code) Lecture 17 Agenda for the lecture Lower bound for variable-length source codes with error Coding a sequence of symbols: Rates and scheme (Arithmetic code) Introduction to universal codes 17.1 variable-length

More information

Binomial Coefficients

Binomial Coefficients Binomial Coefficients Russell Impagliazzo and Miles Jones Thanks to Janine Tiefenbruck http://cseweb.ucsd.edu/classes/sp16/cse21-bd/ May 6, 2016 Fixed-density Binary Strings How many length n binary strings

More information

Let the dynamic table support the operations TABLE-INSERT and TABLE-DELETE It is convenient to use the load factor ( )

Let the dynamic table support the operations TABLE-INSERT and TABLE-DELETE It is convenient to use the load factor ( ) 17.4 Dynamic tables Let us now study the problem of dynamically expanding and contracting a table We show that the amortized cost of insertion/ deletion is only (1) Though the actual cost of an operation

More information

Lossless Compression Algorithms

Lossless Compression Algorithms Multimedia Data Compression Part I Chapter 7 Lossless Compression Algorithms 1 Chapter 7 Lossless Compression Algorithms 1. Introduction 2. Basics of Information Theory 3. Lossless Compression Algorithms

More information

Complete Variable-Length "Fix-Free" Codes

Complete Variable-Length Fix-Free Codes Designs, Codes and Cryptography, 5, 109-114 (1995) 9 1995 Kluwer Academic Publishers, Boston. Manufactured in The Netherlands. Complete Variable-Length "Fix-Free" Codes DAVID GILLMAN* gillman @ es.toronto.edu

More information

Lecture 19. Lecturer: Aleksander Mądry Scribes: Chidambaram Annamalai and Carsten Moldenhauer

Lecture 19. Lecturer: Aleksander Mądry Scribes: Chidambaram Annamalai and Carsten Moldenhauer CS-621 Theory Gems November 21, 2012 Lecture 19 Lecturer: Aleksander Mądry Scribes: Chidambaram Annamalai and Carsten Moldenhauer 1 Introduction We continue our exploration of streaming algorithms. First,

More information

A Secondary storage Algorithms and Data Structures Supplementary Questions and Exercises

A Secondary storage Algorithms and Data Structures Supplementary Questions and Exercises 308-420A Secondary storage Algorithms and Data Structures Supplementary Questions and Exercises Section 1.2 4, Logarithmic Files Logarithmic Files 1. A B-tree of height 6 contains 170,000 nodes with an

More information

Chapter 2: Number Systems

Chapter 2: Number Systems Chapter 2: Number Systems Logic circuits are used to generate and transmit 1s and 0s to compute and convey information. This two-valued number system is called binary. As presented earlier, there are many

More information

Unit 1 Chapter 4 ITERATIVE ALGORITHM DESIGN ISSUES

Unit 1 Chapter 4 ITERATIVE ALGORITHM DESIGN ISSUES DESIGN AND ANALYSIS OF ALGORITHMS Unit 1 Chapter 4 ITERATIVE ALGORITHM DESIGN ISSUES http://milanvachhani.blogspot.in USE OF LOOPS As we break down algorithm into sub-algorithms, sooner or later we shall

More information

Output: For each size provided as input, a figure of that size is to appear, followed by a blank line.

Output: For each size provided as input, a figure of that size is to appear, followed by a blank line. Problem 1: Divisor Differences Develop a program that, given integers m and k satisfying m > k > 0, lists every pair of positive integers (i,j) such that j i = k and both i and j are divisors of m. Input:

More information

CS/COE 1501

CS/COE 1501 CS/COE 1501 www.cs.pitt.edu/~lipschultz/cs1501/ Compression What is compression? Represent the same data using less storage space Can get more use out a disk of a given size Can get more use out of memory

More information

Multimedia Systems. Part 20. Mahdi Vasighi

Multimedia Systems. Part 20. Mahdi Vasighi Multimedia Systems Part 2 Mahdi Vasighi www.iasbs.ac.ir/~vasighi Department of Computer Science and Information Technology, Institute for dvanced Studies in asic Sciences, Zanjan, Iran rithmetic Coding

More information

Horn Formulae. CS124 Course Notes 8 Spring 2018

Horn Formulae. CS124 Course Notes 8 Spring 2018 CS124 Course Notes 8 Spring 2018 In today s lecture we will be looking a bit more closely at the Greedy approach to designing algorithms. As we will see, sometimes it works, and sometimes even when it

More information

Localization in Graphs. Richardson, TX Azriel Rosenfeld. Center for Automation Research. College Park, MD

Localization in Graphs. Richardson, TX Azriel Rosenfeld. Center for Automation Research. College Park, MD CAR-TR-728 CS-TR-3326 UMIACS-TR-94-92 Samir Khuller Department of Computer Science Institute for Advanced Computer Studies University of Maryland College Park, MD 20742-3255 Localization in Graphs Azriel

More information

Point-Set Topology 1. TOPOLOGICAL SPACES AND CONTINUOUS FUNCTIONS

Point-Set Topology 1. TOPOLOGICAL SPACES AND CONTINUOUS FUNCTIONS Point-Set Topology 1. TOPOLOGICAL SPACES AND CONTINUOUS FUNCTIONS Definition 1.1. Let X be a set and T a subset of the power set P(X) of X. Then T is a topology on X if and only if all of the following

More information

Radix Searching. The insert procedure for digital search trees also derives directly from the corresponding procedure for binary search trees:

Radix Searching. The insert procedure for digital search trees also derives directly from the corresponding procedure for binary search trees: Radix Searching The most simple radix search method is digital tree searching - the binary search tree with the branch in the tree according to the bits of keys: at the first level the leading bit is used,

More information

in this web service Cambridge University Press

in this web service Cambridge University Press 978-0-51-85748- - Switching and Finite Automata Theory, Third Edition Part 1 Preliminaries 978-0-51-85748- - Switching and Finite Automata Theory, Third Edition CHAPTER 1 Number systems and codes This

More information

We augment RBTs to support operations on dynamic sets of intervals A closed interval is an ordered pair of real

We augment RBTs to support operations on dynamic sets of intervals A closed interval is an ordered pair of real 14.3 Interval trees We augment RBTs to support operations on dynamic sets of intervals A closed interval is an ordered pair of real numbers ], with Interval ]represents the set Open and half-open intervals

More information

Part 2: Balanced Trees

Part 2: Balanced Trees Part 2: Balanced Trees 1 AVL Trees We could dene a perfectly balanced binary search tree with N nodes to be a complete binary search tree, one in which every level except the last is completely full. A

More information

6. Finding Efficient Compressions; Huffman and Hu-Tucker Algorithms

6. Finding Efficient Compressions; Huffman and Hu-Tucker Algorithms 6. Finding Efficient Compressions; Huffman and Hu-Tucker Algorithms We now address the question: How do we find a code that uses the frequency information about k length patterns efficiently, to shorten

More information

CMPS 10 Introduction to Computer Science Lecture Notes

CMPS 10 Introduction to Computer Science Lecture Notes CMPS Introduction to Computer Science Lecture Notes Binary Numbers Until now we have considered the Computing Agent that executes algorithms to be an abstract entity. Now we will be concerned with techniques

More information

IEEE TRANSACTIONS ON INFORMATION THEORY, VOL. 53, NO. 10, OCTOBER

IEEE TRANSACTIONS ON INFORMATION THEORY, VOL. 53, NO. 10, OCTOBER IEEE TRANSACTIONS ON INFORMATION THEORY, VOL. 53, NO. 10, OCTOBER 2007 3413 Relay Networks With Delays Abbas El Gamal, Fellow, IEEE, Navid Hassanpour, and James Mammen, Student Member, IEEE Abstract The

More information

Algorithms Dr. Haim Levkowitz

Algorithms Dr. Haim Levkowitz 91.503 Algorithms Dr. Haim Levkowitz Fall 2007 Lecture 4 Tuesday, 25 Sep 2007 Design Patterns for Optimization Problems Greedy Algorithms 1 Greedy Algorithms 2 What is Greedy Algorithm? Similar to dynamic

More information

Handout 9: Imperative Programs and State

Handout 9: Imperative Programs and State 06-02552 Princ. of Progr. Languages (and Extended ) The University of Birmingham Spring Semester 2016-17 School of Computer Science c Uday Reddy2016-17 Handout 9: Imperative Programs and State Imperative

More information

II (Sorting and) Order Statistics

II (Sorting and) Order Statistics II (Sorting and) Order Statistics Heapsort Quicksort Sorting in Linear Time Medians and Order Statistics 8 Sorting in Linear Time The sorting algorithms introduced thus far are comparison sorts Any comparison

More information

Dynamic Programming Algorithms

Dynamic Programming Algorithms Based on the notes for the U of Toronto course CSC 364 Dynamic Programming Algorithms The setting is as follows. We wish to find a solution to a given problem which optimizes some quantity Q of interest;

More information

Hyperplane Ranking in. Simple Genetic Algorithms. D. Whitley, K. Mathias, and L. Pyeatt. Department of Computer Science. Colorado State University

Hyperplane Ranking in. Simple Genetic Algorithms. D. Whitley, K. Mathias, and L. Pyeatt. Department of Computer Science. Colorado State University Hyperplane Ranking in Simple Genetic Algorithms D. Whitley, K. Mathias, and L. yeatt Department of Computer Science Colorado State University Fort Collins, Colorado 8523 USA whitley,mathiask,pyeatt@cs.colostate.edu

More information

DESIGN AND ANALYSIS OF ALGORITHMS. Unit 1 Chapter 4 ITERATIVE ALGORITHM DESIGN ISSUES

DESIGN AND ANALYSIS OF ALGORITHMS. Unit 1 Chapter 4 ITERATIVE ALGORITHM DESIGN ISSUES DESIGN AND ANALYSIS OF ALGORITHMS Unit 1 Chapter 4 ITERATIVE ALGORITHM DESIGN ISSUES http://milanvachhani.blogspot.in USE OF LOOPS As we break down algorithm into sub-algorithms, sooner or later we shall

More information

CS 493: Algorithms for Massive Data Sets Dictionary-based compression February 14, 2002 Scribe: Tony Wirth LZ77

CS 493: Algorithms for Massive Data Sets Dictionary-based compression February 14, 2002 Scribe: Tony Wirth LZ77 CS 493: Algorithms for Massive Data Sets February 14, 2002 Dictionary-based compression Scribe: Tony Wirth This lecture will explore two adaptive dictionary compression schemes: LZ77 and LZ78. We use the

More information

CPSC 320 Sample Solution, Playing with Graphs!

CPSC 320 Sample Solution, Playing with Graphs! CPSC 320 Sample Solution, Playing with Graphs! September 23, 2017 Today we practice reasoning about graphs by playing with two new terms. These terms/concepts are useful in themselves but not tremendously

More information

EE-575 INFORMATION THEORY - SEM 092

EE-575 INFORMATION THEORY - SEM 092 EE-575 INFORMATION THEORY - SEM 092 Project Report on Lempel Ziv compression technique. Department of Electrical Engineering Prepared By: Mohammed Akber Ali Student ID # g200806120. ------------------------------------------------------------------------------------------------------------------------------------------

More information

Chapter 14 Global Search Algorithms

Chapter 14 Global Search Algorithms Chapter 14 Global Search Algorithms An Introduction to Optimization Spring, 2015 Wei-Ta Chu 1 Introduction We discuss various search methods that attempts to search throughout the entire feasible set.

More information

Interleaving Schemes on Circulant Graphs with Two Offsets

Interleaving Schemes on Circulant Graphs with Two Offsets Interleaving Schemes on Circulant raphs with Two Offsets Aleksandrs Slivkins Department of Computer Science Cornell University Ithaca, NY 14853 slivkins@cs.cornell.edu Jehoshua Bruck Department of Electrical

More information

Counting. Andreas Klappenecker

Counting. Andreas Klappenecker Counting Andreas Klappenecker Counting k = 0; for(int i=1; i

More information

Algorithm Analysis and Design

Algorithm Analysis and Design Algorithm Analysis and Design Dr. Truong Tuan Anh Faculty of Computer Science and Engineering Ho Chi Minh City University of Technology VNU- Ho Chi Minh City 1 References [1] Cormen, T. H., Leiserson,

More information

Solutions to Homework 10

Solutions to Homework 10 CS/Math 240: Intro to Discrete Math 5/3/20 Instructor: Dieter van Melkebeek Solutions to Homework 0 Problem There were five different languages in Problem 4 of Homework 9. The Language D 0 Recall that

More information

Excerpt from "Art of Problem Solving Volume 1: the Basics" 2014 AoPS Inc.

Excerpt from Art of Problem Solving Volume 1: the Basics 2014 AoPS Inc. Chapter 5 Using the Integers In spite of their being a rather restricted class of numbers, the integers have a lot of interesting properties and uses. Math which involves the properties of integers is

More information

Intro. To Multimedia Engineering Lossless Compression

Intro. To Multimedia Engineering Lossless Compression Intro. To Multimedia Engineering Lossless Compression Kyoungro Yoon yoonk@konkuk.ac.kr 1/43 Contents Introduction Basics of Information Theory Run-Length Coding Variable-Length Coding (VLC) Dictionary-based

More information

Theoretical Computer Science

Theoretical Computer Science Theoretical Computer Science 410 (2009) 3372 3390 Contents lists available at ScienceDirect Theoretical Computer Science journal homepage: www.elsevier.com/locate/tcs An (18/11)n upper bound for sorting

More information

14 Data Compression by Huffman Encoding

14 Data Compression by Huffman Encoding 4 Data Compression by Huffman Encoding 4. Introduction In order to save on disk storage space, it is useful to be able to compress files (or memory blocks) of data so that they take up less room. However,

More information

6. Advanced Topics in Computability

6. Advanced Topics in Computability 227 6. Advanced Topics in Computability The Church-Turing thesis gives a universally acceptable definition of algorithm Another fundamental concept in computer science is information No equally comprehensive

More information

9/24/ Hash functions

9/24/ Hash functions 11.3 Hash functions A good hash function satis es (approximately) the assumption of SUH: each key is equally likely to hash to any of the slots, independently of the other keys We typically have no way

More information

3.2 Recursions One-term recursion, searching for a first occurrence, two-term recursion. 2 sin.

3.2 Recursions One-term recursion, searching for a first occurrence, two-term recursion. 2 sin. Chapter 3 Sequences 3.1 Summation Nested loops, while-loops with compound termination criteria 3.2 Recursions One-term recursion, searching for a first occurrence, two-term recursion. In Chapter 2 we played

More information

IS BINARY ENCODING APPROPRIATE FOR THE PROBLEM-LANGUAGE RELATIONSHIP?

IS BINARY ENCODING APPROPRIATE FOR THE PROBLEM-LANGUAGE RELATIONSHIP? Theoretical Computer Science 19 (1982) 337-341 North-Holland Publishing Company NOTE IS BINARY ENCODING APPROPRIATE FOR THE PROBLEM-LANGUAGE RELATIONSHIP? Nimrod MEGIDDO Statistics Department, Tel Aviv

More information

As an additional safeguard on the total buer size required we might further

As an additional safeguard on the total buer size required we might further As an additional safeguard on the total buer size required we might further require that no superblock be larger than some certain size. Variable length superblocks would then require the reintroduction

More information

Exercise 2: Hopeld Networks

Exercise 2: Hopeld Networks Articiella neuronnät och andra lärande system, 2D1432, 2004 Exercise 2: Hopeld Networks [Last examination date: Friday 2004-02-13] 1 Objectives This exercise is about recurrent networks, especially the

More information

2 Computation with Floating-Point Numbers

2 Computation with Floating-Point Numbers 2 Computation with Floating-Point Numbers 2.1 Floating-Point Representation The notion of real numbers in mathematics is convenient for hand computations and formula manipulations. However, real numbers

More information

Digital Communication Prof. Bikash Kumar Dey Department of Electrical Engineering Indian Institute of Technology, Bombay

Digital Communication Prof. Bikash Kumar Dey Department of Electrical Engineering Indian Institute of Technology, Bombay Digital Communication Prof. Bikash Kumar Dey Department of Electrical Engineering Indian Institute of Technology, Bombay Lecture - 26 Source Coding (Part 1) Hello everyone, we will start a new module today

More information

ALGORITHMS OF INFORMATICS. Volume 3. APPLICATIONS AND DATA MANAGEMENT

ALGORITHMS OF INFORMATICS. Volume 3. APPLICATIONS AND DATA MANAGEMENT ALGORITHMS OF INFORMATICS Volume 3. APPLICATIONS AND DATA MANAGEMENT ELTE EÖTVÖS KIADÓ Budapest, 2006 Editor: Antal Iványi Authors: Ulrich Tamm (Chapter 13), László Szirmay-Kalos (14), János Demetrovics

More information

Approximating Square Roots

Approximating Square Roots Math 560 Fall 04 Approximating Square Roots Dallas Foster University of Utah u0809485 December 04 The history of approximating square roots is one of the oldest examples of numerical approximations found.

More information

6.001 Notes: Section 4.1

6.001 Notes: Section 4.1 6.001 Notes: Section 4.1 Slide 4.1.1 In this lecture, we are going to take a careful look at the kinds of procedures we can build. We will first go back to look very carefully at the substitution model,

More information

Massachusetts Institute of Technology Department of Electrical Engineering and Computer Science Algorithms For Inference Fall 2014

Massachusetts Institute of Technology Department of Electrical Engineering and Computer Science Algorithms For Inference Fall 2014 Massachusetts Institute of Technology Department of Electrical Engineering and Computer Science 6.438 Algorithms For Inference Fall 2014 Recitation-6: Hardness of Inference Contents 1 NP-Hardness Part-II

More information

Hashing. Hashing Procedures

Hashing. Hashing Procedures Hashing Hashing Procedures Let us denote the set of all possible key values (i.e., the universe of keys) used in a dictionary application by U. Suppose an application requires a dictionary in which elements

More information

This book is licensed under a Creative Commons Attribution 3.0 License

This book is licensed under a Creative Commons Attribution 3.0 License 6. Syntax Learning objectives: syntax and semantics syntax diagrams and EBNF describe context-free grammars terminal and nonterminal symbols productions definition of EBNF by itself parse tree grammars

More information

18.3 Deleting a key from a B-tree

18.3 Deleting a key from a B-tree 18.3 Deleting a key from a B-tree B-TREE-DELETE deletes the key from the subtree rooted at We design it to guarantee that whenever it calls itself recursively on a node, the number of keys in is at least

More information

February 24, :52 World Scientific Book - 9in x 6in soltys alg. Chapter 3. Greedy Algorithms

February 24, :52 World Scientific Book - 9in x 6in soltys alg. Chapter 3. Greedy Algorithms Chapter 3 Greedy Algorithms Greedy algorithms are algorithms prone to instant gratification. Without looking too far ahead, at each step they make a locally optimum choice, with the hope that it will lead

More information

Algorithms Exam TIN093/DIT600

Algorithms Exam TIN093/DIT600 Algorithms Exam TIN093/DIT600 Course: Algorithms Course code: TIN 093 (CTH), DIT 600 (GU) Date, time: 22nd October 2016, 14:00 18:00 Building: M Responsible teacher: Peter Damaschke, Tel. 5405 Examiner:

More information

CSE100. Advanced Data Structures. Lecture 13. (Based on Paul Kube course materials)

CSE100. Advanced Data Structures. Lecture 13. (Based on Paul Kube course materials) CSE100 Advanced Data Structures Lecture 13 (Based on Paul Kube course materials) CSE 100 Priority Queues in Huffman s algorithm Heaps and Priority Queues Time and space costs of coding with Huffman codes

More information

Lecture 9 - Matrix Multiplication Equivalences and Spectral Graph Theory 1

Lecture 9 - Matrix Multiplication Equivalences and Spectral Graph Theory 1 CME 305: Discrete Mathematics and Algorithms Instructor: Professor Aaron Sidford (sidford@stanfordedu) February 6, 2018 Lecture 9 - Matrix Multiplication Equivalences and Spectral Graph Theory 1 In the

More information

How invariants help writing loops Author: Sander Kooijmans Document version: 1.0

How invariants help writing loops Author: Sander Kooijmans Document version: 1.0 How invariants help writing loops Author: Sander Kooijmans Document version: 1.0 Why this document? Did you ever feel frustrated because of a nasty bug in your code? Did you spend hours looking at the

More information

Encoding/Decoding. Russell Impagliazzo and Miles Jones Thanks to Janine Tiefenbruck. May 9, 2016

Encoding/Decoding. Russell Impagliazzo and Miles Jones Thanks to Janine Tiefenbruck.   May 9, 2016 Encoding/Decoding Russell Impagliazzo and Miles Jones Thanks to Janine Tiefenbruck http://cseweb.ucsd.edu/classes/sp16/cse21-bd/ May 9, 2016 Review: Terminology A permutation of r elements from a set of

More information

Theorem 2.9: nearest addition algorithm

Theorem 2.9: nearest addition algorithm There are severe limits on our ability to compute near-optimal tours It is NP-complete to decide whether a given undirected =(,)has a Hamiltonian cycle An approximation algorithm for the TSP can be used

More information

Lempel-Ziv-Welch (LZW) Compression Algorithm

Lempel-Ziv-Welch (LZW) Compression Algorithm Lempel-Ziv-Welch (LZW) Compression lgorithm Introduction to the LZW lgorithm Example 1: Encoding using LZW Example 2: Decoding using LZW LZW: Concluding Notes Introduction to LZW s mentioned earlier, static

More information

Fundamentals of Multimedia. Lecture 5 Lossless Data Compression Variable Length Coding

Fundamentals of Multimedia. Lecture 5 Lossless Data Compression Variable Length Coding Fundamentals of Multimedia Lecture 5 Lossless Data Compression Variable Length Coding Mahmoud El-Gayyar elgayyar@ci.suez.edu.eg Mahmoud El-Gayyar / Fundamentals of Multimedia 1 Data Compression Compression

More information

Treewidth and graph minors

Treewidth and graph minors Treewidth and graph minors Lectures 9 and 10, December 29, 2011, January 5, 2012 We shall touch upon the theory of Graph Minors by Robertson and Seymour. This theory gives a very general condition under

More information

ARELAY network consists of a pair of source and destination

ARELAY network consists of a pair of source and destination 158 IEEE TRANSACTIONS ON INFORMATION THEORY, VOL 55, NO 1, JANUARY 2009 Parity Forwarding for Multiple-Relay Networks Peyman Razaghi, Student Member, IEEE, Wei Yu, Senior Member, IEEE Abstract This paper

More information