Lecture on Computer Networks

Size: px

Start display at page:

Download "Lecture on Computer Networks"

Jocelin Lambert
5 years ago
Views:

1 Lecture on Computer Networks Historical Development Copyright (c) 2008 Dr. Thomas Haenselmann (Saarland University, Germany). Permission is granted to copy, distribute and/or modify this document under the terms of the GNU Free Documentation License, Version 1.2 or any later version published by the Free Software Foundation; with no Invariant Sections, no Front-Cover Texts, and no Back-Cover Texts. time

2 Source coding in data networks Motivation for source coding in data networks This short introduction addresses the problem of how to code payload optimally for the network. Theoretically, for an error-free communication channel, no special precautions are necessary, we could simply forward the bits. More realistic, we know that the channel introduces errors. So it can make sense to send more bits than only the data to transmit to account for the erroneous channel.

3 Source coding in data networks Error control: Detecting vs. Correcting Basically two variants are distinguished: Error detecting codes Error correcting codes (Forward Error Correction) Both variants need a certain amount of redundancy. Which variant makes sense in which situation? Error detection: Detection might be sufficient if the sink can ask the source for repeated transmission. Therefore, a feedback channel and a defined protocol are necessary. In some cases, erroneous data can simply be deleted without retransmission (e.g., IP-telephony).

4 Source coding in data networks Error control: Detecting vs. Correcting Error correcting codes: A larger amount of redundancy is necessary. This makes sense if no feedback channel is available or a retransmission delays the packet considerably (e. g. telephony, video conferences). What is better: Forward error correction or retransmission?

5 Source coding in data networks Error control: The Hamming-Distance Error correcting codes: A larger amount of redundancy is necessary. This makes sense if Operand Operand XOR The Hamming-Distance of two codewords corresponds exactly to the number of bits the codewords differ in. This is calculated using the XOR-operation. In other words: If two codewords have the Hamming-Distance d, then it is necessary to toggle d bits to transform one codeword into the other.

6 Source coding in data networks Error control: The Hamming-Distance The Hamming-Distance of an entire code is defined as the smallest possible distance two arbitrary (but different) codewords of a code can have. To detect errors we need a code with valid and invalid words. To be able to detect d bit errors in a codeword the code must have a distance of d+1. Why does less distance not suffice?

7 Source coding in data networks Error control: Error correcting codes Error correcting codes need a distance of 2d+1 if d errors need to be detected and corrected. Why does less distance not suffice? With a distance of 2d+1 we need to toggle 2d+1 bits to get from one valid codeword to another. If we toggle d bits only then the effort to get back to the original codeword is also exactly toggling d bits. But to get to a different (not the original) codeword when there are d bit errors it is necessary to toggle d+1 bits. If only d bit errors may occur then the shortest (and only) way to get to the next codeword is toggling d bits. Example: 2 bits are encoded by 10 Orig. code: E.corr.Code: How many bit errors can be detected and corrected here?

8 Source coding in data networks Error control: Redundancy estimation of error correcting codes How many bits do we need at least for the correction of a bit error? We want to have 2^m valid codewords. r correction bits are needed. The error correcting code will have n = (m + r) bits in total. By toggling one of the n bits (redundant bits may be toggled, too) we get an (illegal) codeword with a distance of 1.That means, for each of the 2^m valid words we can create n invalid words. Hence follows n 1 2 m 2 n

9 Source coding in data networks Error control: Redundancy estimation of error correcting codes How many bits do we need at least for the correction of a bit error? (n + 1) is composed of n invalid (each created by one bit error) and one valid codeword. The number of words of the error correcting code stands on the right side of the inequality. n can be written as m data bits plus r correction bits. m r 1 2 m 2 m r m r 1 2 r With a given m we can estimate the number of correction bits a code needs in any case.

10 Source coding in data networks Error control: The error correcting Hamming code The Hamming code achieves the minimum of the previous estimation. Algorithm: Number the bits from the LSB to the MSB. All 2^n bits are check bits, the rest are data bits. The data bits are filled up from left to right with the actual data, the check bits only depend on the data bits. Which data bit influences which check bit? Check bits Data bits

11 Source coding in data networks Error control: The error correcting Hamming code Check bits Data bits To see this, convert the (order) number of a bit into its binary representation: 11 = Data bit 11 hence influences check bits 8, 2 and 1. Check bit 1 is of course also influenced by the data bits 3, 5, 7 and 9. By definition all check bits combined with their data bits have to exhibit an even (or odd, depending what was agreed upon) parity.

12 Source coding in data networks Error control: The error correcting Hamming code Example: Data: digit?? 1? 1 0 1? data w/out check bits Conversion Data and check bits Correction procedure First, a counter is set to zero. Then, the check bits are checked one by one. If one of them produces the wrong parity, the number of the corresponding check bit is added to the counter. In the end, the counter points to the toggled bit.

13 Source coding in data networks Error control: The error correcting Hamming code Another interpretation of the error recovery scheme: Why does the counter value reference the defective bit? Bit 1 defective? yes: Bits 3,5,7,9 and 11 may be defective Bit 2 defective? no: Bits 5 and 9 are left Bit 4 defective? yes: Only bit 5 is left

14 Source coding in data networks Error control: Cyclic Redundancy Check (CRC) CRC is based on the idea of polynomial division. Remember: (x 5 +x 3 +x+1):(x+1) = x 4 x 3 +2x 2 2x+3 [2/(x+1)] (x 5 +x 4 ) 0 x 4 +x 3 ( x 4 x 3 ) 0 + 2x 3 +x (2x 3 +2x 2 ) 0 2x 2 +x ( 2x 2 2x) 0 3x+1 (3x+3) Check: [x 4 x 3 +2x 2 2x+3 2/(x+1)]*(x+1) = = Remainder or modulus What's the difference between polynomial division and normal division?

15 Source coding in data networks Error control: Cyclic Redundancy Check A bit string is interpreted as a polynomial by numbering the bits consecutively and, if a bit is set, by adding the corresponding term to the polynomial. In other words: Use the bits as coefficients. Example: position data bits 1x 7 + 1x 6 + 0x 5 + x 4 + 0x 3 + 1x 2 + 0x 1 + 1x 0 given data bits is the polynomial corresponding to the

16 Source coding in data networks Error control: Cyclic Redundancy Check The principle of CRC: Sender and recipient agree upon a divisor polynomial, also called generator polynomial. Then, g zeros are added to the message, g being the degree of the generator polynomial. In the next step, the sender divides the message (extended by g zeros), similar to the polynomial division. In most cases there will be a remainder, the result of the division is of no interest. The remainder is then subtracted from the message (being extended by the zeros). The resulting bit string is now transferred to the recipient. If the message was transmitted correctly, no remainder should emerge on the recipient's side. Why? Because the sender intentionally subtracted the remainder before sending the message. The g zeros which emerge after the division are interpreted by the recipient as an indication of an error free transmission. Note: The sender can safely subtract the remainder without harming the message. Why?

17 Source coding in data networks Error control: Cyclic Redundancy Check The only difference to normal polynomial division: Calculations are binary and after the calculation of each digit a modulo 2 operation is performed! In other words: Always ignore the carry-over. This simplifies the addition and subtraction significantly Discovery: Both operations, plus and minus, are equivalent to the XOR operation.

18 Source coding in data networks Error control: Cyclic Redundancy Check (CRC) Example: Message: Generator polynomial: (x 4 + x + 1) = 4 th degree (5 th Message extended by 4 zeros: order) Division: :10011= (the result does not matter) XOR XOR XOR XOR = remainder Special note: The division continues, if the MSB (most significant bit) of the divisor and of the bit string currently being divided is set. Sometimes the bit string has to be extended by new bits (from the message) until the generator polynomial fits under it minus (XOR) = transmitted message, which should generate no remainder when divided

19 Source coding in data networks Error control: (CRC) Recognized errors Which errors are recognized? For the following analysis separate the error from the message: Transmitted message (resp. the corresponding polynomial) including an error: Original message (error free): M(x) T(x) Separate the transmitted message in: M(x) = T(x) + E(x) with E(x) being the isolated error. Every bit which is set in E stands for a toggled bit in M. A sequence from the first 1 bit to the last 1 bit is called a burst-error. A burst-error can occur anywhere in E. Question: Does the following division by the generator polynomial G(x) produce a remainder? If not, we cannot detect the error. [T(x) + E(x)]:G(x) = remainder less? T(x):G(x) is divisible without any remainder, because we constructed the message exactly for this property. The analysis is therefore reduced to the question whether E(x):G(x) erroneously results in no remainder thus passing undetected.

20 Source coding in data networks Error control: (CRC) Recognized errors Which errors are recognized? 1-bit error: The burst consists of only one error. If the generator polynomial has more than one coefficient, E(x) with a leading 1 followed by zeros cannot be divided without a remainder. So we are on the save side with regard to 1 bit errors. Our generator polynomial is at least as good as a parity bit. Example: 1000(...)0:101= continued as above...

21 Source coding in data networks Error control: (CRC) Recognized errors Which errors are recognized? 2-bit error: A 2-bit error must look like this: x i + (...) + x j, therefore x j can be factored out, which results in x j (x (i-j) + 1). It has already been shown that a generator polynomial with more than one term cannot divide the factor x j. When is a term (x k + 1) divided? (with k = i - j) For a given generator polynomial this has to be tested for 2-bit bursts with different lengths. Here, the error (inevitably) has the form 10(...)01. What follows is an example program to test whether the generator polynomial x 15 + x is useful for detecting 2-bit errors.

22 Source coding in data networks Error control: (CRC) Recognized errors Which errors are recognized? main() { Int MAX_LENGTH = 60000; char* generator = " "; char* bit_string = new char[max_length+1]; for(int length = 2; length < MAX_LENGTH; length++) { if((length % 100) == 0) cout << length << endl; for(int j = 1; j < length 1; j++) // clear bitstring bit_string[j] = '0'; bit_string[0] = '1'; bit_string[length 1] = '1'; bit_string[length] = 0; // test if divisible by generator polynomial if(divisible(bit_string, length, generator, strlen(generator)) == true) { cout << "Division successful with length " << length << endl; break; } // if } // for if(bit_string) delete[] bit_string; } // main

23 Source coding in data networks Error control: (CRC) Recognized errors Which errors are recognized? Error polynomials with an odd number of terms: Speculation: If the generator polynomial contains the term (x 1 + x 0 ), an error string with an odd number of bits cannot be divided. Proof by contradiction: Assuming E(x) being divisible by (x 1 + x 0 ), the factor can also be extracted: E(x) = (x 1 + x 0 ) Q(x) So far we only divided polynomials. Now, for the first time we use them as functions and evaluate it for x = 1. (x 1 + x 0 ) equals (1 + 1) and Q(x) equals 1, because Q(x) still contains an odd number of terms (additions are still done modulo 2). Hence follows (1 + 1) Q(x) = 0 x 1 = 0 But in the beginning we assumed that E(x) contains an odd number of terms. Thus, the result should have been 1, not 0. As follows, the factor (x 1 +x 0 ) cannot be extracted. As a consequence, E(x) is not divisible by (x 1 + x 0 ), if it contains an odd number of terms (or error bits). Result: The generator polynomial should contain the term (x 1 + x 0 ) to catch all errors with an odd number of bits.

24 Source coding in data networks Error control: (CRC) Recognized errors Which errors are recognized? Recognition of burst errors of length r: The burst error in E(x) could look like this: 0001 anything To move the last bit to the very right, a factor can be factored out: E(x)=x i (x (r-1) ), with i being the number of zeros on the right side of the last 1. If the degree of the generator polynomial itself is r (hence it has r + 1 terms), the error of the form (x (r-1) ) cannot be divided either, because the generator polynomial is larger than the error to be divided. Example (decimal system): 99: is not divisible without a remainder (result 0, remainder 99). Detecting burst errors with length r is trivial in the way that the error itself simply occurs at the end of the division. Even if the burst is just as large as the generator polynomial (which means r+1 bits), the division yields no remainder only if by chance the error coincides exactly with the generator polynomial. This is possible, but not very likely. Example (decimal system): : = 1 (modulo 0)

25 Source coding in data networks Principle of Data Fountains A number of sources generate a theoretically endless stream of packets which are derived from a file. All packets are pairwise different. source 1 (streaming file A) source 2 (streaming file A) source n (streaming file A) Given that the file consists of p packets, the sink needs to receive only any p+k packets to regenerate the file no matter from which source, no matter which packets. Any choice of p+k packets will do (k < 5% of the file size). (mystical, right?) sink (downloading file A)...

26 Source coding in data networks Principle of Data Fountains Mathematical foundation: Let F be a file split into chunks to form a n x m bit-matrix and let T be a invertible n x n bit-matrix (containing only 0 and 1). TxF=M The bit-matrix M is transmitted line-by-line over the net. At the receiver side, the original file is obtained by T -1 xm=f If T is the unit matrix, then the sender simply sends a file chunk-bychunk like e.g., know from FTP. Advantage: Nothing to compute. Disadvantage: Any missing chunk will corrupt the file.

27 Source coding in data networks Principle of Data Fountains Data fountains rather make use of a matrix T which is more densely populated with 1-bits. The gain is that we don't rely on each individual data-chunk. Any number of m chunks (or slightly more) will do. Algorithm Sender: The sender splits a file into n data chunks (parts). Then, a random bit-vector of size n is created. The chunks with a corresponding 1 are XORed into a new chunk which is broadcast over the network. This process can be repeated endlessly to produce more and more packets from the same file.

28 Source coding in data networks Principle of Data Fountains Receiver: It collects n packets with the corresponding bit-vectors. The receiver knows these bit vectors used by the sender because is uses exactly the same random number generator. After having obtained n packets with linear independent bit-vectors (namely the matrix T), it inverts T and back transforms the message M into the original file F. (The random number sources have to be synchronized.)

29 Source coding in data networks Principle of Data Fountains Example Data to be transmitted: Random bit-stream: Sender Receiver = = = The receiver starts decoding the incoming packets no sooner than he got an invertible matrix. If this is not (yet) the case, the receiver gathers more packets (and more random data) until the inverse matrix can be calculated. The random bits are not sent but generated by two synchronized (e.g., by an index) random number generators.

30 Source coding in data networks Principle of Data Fountains Pro A missing packets means a missing random bit vector. However, the next packet will likely fill the gap. In a traditional transmission a missing part of a file can be replace by no other packet rather than the missing one. Many sources can contribute without mutual cooperation. Con Inverting the large matrix might be time- and memory consuming. To generate a single new packet, the sender has to XOR 50% of the entire file. Improvement: Don't generate random bits for the transform but a matrix which is mostly populated along the diagonal.

31 Motivation for compression in data networks The bandwidth provided by the physical layer is usually not available for an application because of protocol overhead introduced by each layer of the network stack. By optimizing protocols we can try to exploit the resources more optimally. However, the potential of not transmitting redundant data bears a much larger potential for increased efficiency, often several orders of magnitude larger than the wasted network bandwidth. Question: Every once in a while we can read in the press, that a compression algorithm has been invented which can (successfully) compress its own output again (repeatedly). In particular it is usually claimed that random data can be compressed. Why is this unlikely? There is a long history of inventors and companies how claim to have achieved the above described. For details see

32 Handling huge data volumes Text Still image 1 page with 80 characters/line and 64 lines/page and 1 byte/char results in 80 * 64 * 1 * 8 = 41 kbit/page 24 bits/pixel, 512 x 512 pixel/image results in 512 x 512 x 24 = ca. 6 Mbit/image Audio Video CD quality, sampling rate 44,1 KHz, 16 bits per sample results in 44,1 x 16 = 706 kbit/s. Stereo: 1,412 Mbit/s Full-size frame 1024 x 768 pixel/frame, 24 bits/pixel, 30 frames/s results in 1024 x 768 x 24 x 30 = 566 Mbit/s. More realistic: 360 x 240 pixel/frame, 360 x 240 x 24 x 30 = 60 Mbit/s => Storage and transmission of multimedia streams require compression!

33 Example 1: ABC -> 1; EE -> 2 Example 2: Note that in this example both algorithms lead to the same compression rate.

34 Run Length Coding Principle Replace all repetitions of the same symbol in the text ( runs ) by a repetition counter and the symbol. Example Text: AAAABBBAABBBBBCCCCCCCCDABCBAABBBBCCD Encoding: 4A3B2A5B8C1D1A1B1C1B2A4B2C1D As we can see, we can only expect a good compression rate when long runs occur frequently. Examples are long runs of blanks in text documents or leading white pixels in gray-scale images.

35 When dealing with binary files we are sure that a run of 1 s is always followed by a run of 0 s and vice versa. It is thus sufficient to store the repetition counters only! Example

36 Run Length Coding, Legal Issues - Beware of the patent trap! Runlength encoding of the type (length, character) US Patent No: 4,586,027 Title: Method and system for data compression and restoration Filed: 07-Aug-1984 Granted: 29-Apr-1986 Inventor: Tsukimaya et al. Assignee: Hitachi Runlength encoding (length [<= 16], character) Number: 4,872,009 Title: Method and apparatus for data compression and restoration Filed: 07-Dec-1987 Granted: 03-Oct-1989 Inventor: Tsukimaya et al. Assignee: Hitachi

37 Variable Length Coding Classical character codes use the same number of bits for each character. When the frequency of occurrence is different for different characters, we can use fewer bits for frequent characters and more bits for rare characters. Example Code 1: A B C D E (binary) Encoding of ABRACADABRA with constant bit length (= 5 Bits): Code 2: A B R C D Encoding:

38 Delimiters Code 2 can only be decoded unambiguously when delimiters are stored with the codewords. This can increase the size of the encoded string considerably. Idea No code word should be the prefix of another codeword! We will then no longer need delimiters. Code 3: A 11 B 00 R 011 C 010 D 10 Encoded string:

39 Representation as a TRIE (or prefix tree) An obvious method to represent such a code as a TRIE. In fact, any TRIE with M leaf nodes can be used to represent a code for a string containing M different characters. The figure on the next page shows two codes which can be used for ABRACADABRA. The code for each character is represented by the path from the root of the TRIE to that character where 0 goes to the left, 1 goes to the right, as is the convention for TRIEs. The TRIE on the left corresponds to the encoding of ABRACADABRA on the previous page, the TRIE on the right generates the following encoding: which is two bits shorter.

40 Two Tries for our Example The TRIE representation guarantees indeed that no codeword is the prefix of another codeword. Thus the encoded bit string can be uniquely decoded.

41 Huffman Code Now the question arises how we can find the best variable-length code for given character frequencies (or probabilities). The algorithm that solves this problem was found by David Huffman in Algorithm Generate-Huffman-Code Determine the frequencies of the characters and mark the leaf nodes of a binary tree (to be built) with them. Out of the tree nodes not yet marked as DONE, take the two with the smallest frequencies and compute their sum. Create a parent node for them and mark it with the sum. Mark the branch to the left son with 0, the one to the right son with 1. Mark the two son nodes as DONE. When there is only one node not yet marked as DONE, stop (the tree is complete). Otherwise, continue with step 2.

42 Huffman Code, Example Probabilities of the characters: p(a) = 0.3; p(b) = 0.3; p(c) = 0.1; p(d) = 0.15; p(e) = % 0 30% 30% A B % % 40% % 15% 15% C D E

43 Huffman Code, why is it optimal? Characters with higher probabilities are closer to the root of the tree and thus have shorter codeword lengths; thus it is a good code. It is even the best possible code! Reason: The length of an encoded string equals the weighted outer path length of the Huffman tree. To compute the weighted outer path length we first compute the product of the weight (frequency counter) of a leaf node with its distance from the root. We then compute the sum of all these values over the leaf nodes. This is obviously the same as summing up the products of each character s codeword length with its frequency of occurrence. No other tree with the same frequencies attached to the leaf nodes has a smaller weighted path length than the Huffman tree.

44 Sketch of the Proof With a similar construction process, another tree could be built but without always combining the two nodes with the minimal frequencies. We can show by induction that no other such strategy will lead to a smaller weighted outer path length than the one that combines the minimal values in each step.

45 Decoding Huffman Codes (1) An obvious possibility is to use the TRIE: Read the input stream sequentially and traverse the TRIE until a leaf node is reached. When a leaf node is reached, output the character attached to it. To decode the next bit, start again at the root of the TRIE. Observation The input bit rate is constant, the output character rate is variable.

46 Decoding Huffman Codes (2) As an alternative we can use a decoding table. Creation of the decoding table: If the longest codeword has L bits, the table has 2 L entries. Let c i be the codeword for character s i. Let c i have l i bits. We then create 2 L-li entries in the table. In each of these entries the first l i bits are equal to c i, and the remaining bits take on all possible L-l i binary combinations. At all these addresses of the table we enter s i as the character recognized, and we remember l i as the length of the codeword.

47 Decoding with the Table Algorithm Table-Based Huffman Decoder Read L bits from the input stream into a buffer. Use the buffer as the address into the table and output the recognized character s i. Remove the first l i bits from the buffer and pull in the next l i bits from the input bit stream. Continue with step 2. Observation Table-based Huffman decoding is fast. The output character rate is constant, the input bit rate is variable.

48 Huffman Code, Comments A very good code for many practical purposes. Can only be used when the frequencies (or probabilities) of the characters are known in advance. Variation: Determine the character frequencies separately for each new document and store/transmit the code tree/table with the data. Note that a loss in optimality comes from the fact that each character must be encoded with a fixed number of bits, and thus the codeword lengths do not match the frequencies exactly (consider a code for three characters A, B and C, each occurring with a frequency of 33 %).

49 Critical Review of 0-Order Codes (1) ABCD ABCD ABCD ABCD Is quite redundant but cannot be compressed by Huffman. Huffman is sometimes referred to as a 0-order entropy model which means that each character is generated independent of its predecessor. No character makes a particular successor more likely. current character A B C D next character A B C D

50 Critical Review of 0-Order Codes (2) ABCD ABCD ABCD ABCD A string of the above type could be better described by a 1-order entropy model: the current character gives a hint on what character is expected next. current character A B next character A B The example above is trivial since each character uniquely determines the next one. C D C D

51 Critical Review of 0-Order Codes (3) Most sources produce characters with varying relative occurrences which can be encoded with Huffman, but which also exhibit intercharacter correlations. In the English language a c is often followed by an h, but not very often by a z. The example on the right consists of only four characters. Once an A has occurred, only another A or a B will follow. As an exception, a D is always followed by another D. Let us see how many bits we must spend if this particular correlation is known: current character 25% 25% 25% 25% A B C D next character A B C D assumed relative occurrence (used in the next slide)

52 Critical Review of 0-Order Codes (4) We assume that the occurrence of every character is equal (25%). Once it has occurred, the following two characters are equally likely (e.g., 50% for A->A and 50% for A->B)* X=current character P(X)=probability for char. X Y X = Y occurred and X was its predecessor P(Y X) = Prob. for Y if X was predecessor X= A B C D P(X)= Y X= A B B C C D D P(Y X)= log(p(y X))= P(Y X)log(P(Y X))= Sum [P()P()log()]= H(Y X)= 0.75 Bits for coding a character (first character must be given) *Note: The example is not realistic. It was chosen for easy calculation. A B C D A B C D

53 Critical Review of 0-Order Codes (5) Number of bits needed in 1-order entropy models in general: H ( ) P( x) P( y x)( 1)log ( P( y x)) ( Y X ) = x y 2 This gives us the number of bits we need in order to code a y if we have seen an x before. This is the probability of getting a character y next if we have currently seen an x. Note that in real world example this can be zero very often because many combinations of characters simply never occur. This is why we can save on bits! In this sum, all possible occurrences of X are considered. x can be considered to be the first character which determines the next character y. Each x occurs with a probability of P(x). This determines the number of bits we need in order to code a letter Y (y is a particular instance) if an X occurred already

54 Lempel-Ziv Code Lempel-Ziv codes are an example of the large group of dictionarybased codes. Dictionary: A table of character strings which is used in the encoding process. Example The word lecture is found on page x4, line y4 of the dictionary. It can thus be encoded as (x4,y4). A sentence such as this is a lecture could perhaps be encoded as a sequence of tuples (x1,y1) (x2,y2) (x3,y3) (x4,y4).

55 Dictionary-Based Coding Techniques Static techniques The dictionary exists before a string is encoded. It is not changed, neither in the encoding nor in the decoding process. Dynamic techniques The dictionary is created on the fly during the encoding process, at the sending (and sometimes also at the receiving) side. Lempel and Ziv have proposed an especially brilliant dynamic, dictionary-based technique (1977). Variations of this techniques are used very widely today for lossless compression. An example is LZW (Lempel/Ziv/Welch) which is invoked with the Unix compress command. The well-known TIFF format (Tag Image File Format) is also based on Lempel-Ziv coding.

56 Ziv-Lempel Coding, the Principle The current piece of the message can be encoded as a reference to an earlier (identical) piece of the message. This reference will usually be shorter than the piece itself. As the message is processed, the dictionary is created dynamically. InitializeStringTable(); WriteCode(ClearCode); w = the empty string; for each character in string { K = GetNextCharacter(); if w + K is in the string table { w=w+k /* string concatenation*/ } else { WriteCode(CodeFromString(w)); AddTableEntry(w+K); w=k } } WriteCode(CodeFromString(w));

57 LZW, Example 1, Encoding Alphabet: {A, B, C} Message: A B A B C B A B A B Encoded Message: ω +K ω DICTIONARY A A index = code entry AB B BA A 1 A AB ABC CB BA BAB BA BAB AB C B BA B BA BAB 2 B 3 C 4 AB 5 BA 6 ABC 7 CB 8 BAB

58 LZW Algorithm: Decoding (1) Note that the decoding algorithm also creates the dictionary dynamically, the dictionary is not transmitted! While((Code=GetNextCode()!= EofCode) { if (Code == ClearCode) { InitializeTable(); Code = GetNextCode(); if (Code==EofCode) break; WriteString(StringFromCode(Code)); OldCode = Code; } /* end of ClearCode case */ else { if (IsInTable(Code)) { WriteString( StringFromCode(Code) ); AddStringToTable(StringFromCode(OldCode)+ FirstChar(StringFromCode(Code))); OldCode = Code; }

59 LZW Algorithm: Decoding (2) } } else {/* code is not in table */ OutString = StringFromCode(OldCode) + FirstChar(StringFromCode(OldCode))); WriteString(OutString); AddStringToTable(OutString); OldCode = Code; }

60 LZW, Example 2, Decoding Alphabet: {A, B, C,D} Transmitted Code: A B A C AB A code oldcode DICTIONARY 1 1 index entry A B C D AB 6 BA 7 AC 8 CA 9 ABA

61 LZW, Properties The dictionary is created dynamically during the encoding and decoding process. It is neither stored nor transmitted! The dictionary adapts dynamically to the properties of the character string. With length N of the original message, the encoding process is of complexity O(N). With length M of the encoded message, the decoding process is of complexity O(M). These are thus very efficient processes. Since several characters of the input alphabet are combined into one character of the code, M <= N.

62 Typical Compression Rates Typical examples of file sizes in % of the original size Type of file Encoded with Huffman Encoded with Lempel Ziv C source code 65 % 45 % machine code 80 % 55 % text 50 % 30 %

63 Arithmetic Coding From an information theory point of view, the Huffman code is not quite optimal since a codeword must always consist of an integer number of bits even if this does not correspond exactly to the frequency of occurrence of the character. Arithmetic coding solves this problem. Idea An entire message is represented by a floating point number out of the interval [0,1). For this purpose the interval [0,1) is repeatedly subdivided according to the frequency of the next symbol. Each new sub-interval represents one symbol. When the process is completed the shortest floating point number contained in the target interval is chosen as the representative for the message.

64 Arithmetic Coding, the Algorithm Begin in front of the first character of the input stream, with the current interval set to [0,1). Read the next character from the input stream. Subdivide the current interval according to the frequencies of all characters of the alphabet. Select the subinterval corresponding to the current character as the next current interval. If you reach the end of the input stream or the end symbol, go to step 3. Otherwise go to step 1. From the current (final) interval, select the floating point number that you can represent in the computer with the smallest number of bits. This number is the encoding of the string.

65 Arithmetic Coding, the Decoding Algorithm Algorithm Arithmetic Decoding Subdivide the interval [0,1) according to the character frequencies, as described in the encoding algorithm, up to the maximum size of a message. The encoded floating point number uniquely identifies one particular subinterval. This subinterval uniquely identifies one particular message. Output the message.

66 Arithmetic Coding, Example Alphabet = {A,B,C} Frequencies (probabilities): p(a) = 0.2; p(b) = 0.3; p(c) = 0.5 Messages: ACB AAB (maximum size of a messe is 3). Encoding of the first block ACB: 0 0,2 0,5 1 A B C 0 0,04 0,1 0,2 A B C 0,1 0,12 0,15 0,2 A B C Final interval: [0.12; 0.15) choose e.g

67 Arithmetic Coding, Implementation So far we dealt with real numbers of (theoretically) infinit precision. How do we actually encode a message with many characters (like several megabytes)? If character A occurs with a probability of 20%, the number of digits for coding consecutive A s grows very fast: First A < 0.2, second A < 0.04, third A < 0.008, , , and so on. Let us assume, that our processor has 8-bit wide registers. (decimal 0) (0.2 x 255=51) (0.5 x 255=127) A B C 0,2 0,5 (decimal 255) Note: We skip to store or transmit 0. since it is redundant. Advantage: We have a binary fixed point arithmetic representation of fractions and we can compute them in the processor s register.

68 Arithmetic Coding, Implementation Disadvantage: Even when using 32-bit or 64-bit CPU registers we can only code a couple of characters. Solution: Once our interval gets smaller and smaller, we will obtain a growing number of leading bits which have settled (they will never change). So we will transmit them to the receiver and shift them our of our register to gain new bits. Example: Interval of first A = [ , ] No matter which characters follow, the most significant two zeros from the lower and the upper bound will never change. So we store or transmit them and shift the rest two digits to the left. As a consequence we gain two new least significant digits: A=[ ??, ?? ]

69 Arithmetic Coding, Implementation A=[ ??, ?? ] How do we initialize the new digits prior to the ongoing encoding? Obviously, we want to keep the interval as large as possible. This is achieved by filling the lower bound with 0-bits and the upper bound with 1-bits. A new =[ , ] Note that always adding 0-bits to the lower bound and 1-bits to the upper bound introduces a mistake, because the size of the interval does not exactly correspond to the probability of character A. However, we will not get into trouble if the encoding and the decoding side makes the same mistake.

70 Arithmetic Coding, Properties The encoding depends on the probabilities (relative occurrences) of the characters. The higher the frequency, the larger the subinterval; the smaller the number of bits needed to represent it. The code length reaches the theoretical optimum: The number of bits used for each character need not be an integer. It can approach the real probability better than using the Huffman code. We always need a terminal symbol to stop the encoding process. Problem: One bit error destroys the entire message.

Entropy Coding. - to shorten the average code length by assigning shorter codes to more probable symbols => Morse-, Huffman-, Arithmetic Code

Entropy Coding. - to shorten the average code length by assigning shorter codes to more probable symbols => Morse-, Huffman-, Arithmetic Code Entropy Coding } different probabilities for the appearing of single symbols are used - to shorten the average code length by assigning shorter codes to more probable symbols => Morse-, Huffman-, Arithmetic