Multimedia Systems. Part 20. Mahdi Vasighi

Multimedia Systems Part 2 Mahdi Vasighi www.iasbs.ac.ir/~vasighi Department of Computer Science and Information Technology, Institute for dvanced Studies in asic Sciences, Zanjan, Iran

rithmetic Coding widely used entropy coder. Variable length source coding technique Only problem is its speed due possibly complex computations due to large symbol tables. Good compression ratio (better than Huffman coding), entropy around the Shannon ideal value. Here we describe basic approach of rithmetic Coding.

rithmetic Coding The idea behind arithmetic coding is: encode the entire message into a single real number, n, (. n <.). Consider a probability line segment, [... ), ssign to every symbol a range in this interval Range is proportional to probability with position at cumulative probability. Once we have defined the ranges and the probability line: Start to encode symbols. Every symbol defines where the output real number lands within the range.

rithmetic Coding ssume we have the following : C occurs with probability.5. and C with probabilities.25. Start by assigning each symbol to the probability range [... ). Sort symbols highest probability first: Symbol Range [.,.5) [.5..75) C [.75,.) The first symbol in our example stream is

rithmetic Coding The first symbol in our example stream is [.5..75) Subdivide the range for the first symbol For the second symbol (range =.25, low =.5, high =.75) Symbol Range [.5,.625) [.625..6875) C [.6875,.75) reapply the subdivision of our scale again to get for our third symbol:(range =.25, low =.5, high =.625): Symbol Range [.5,.5625) [.5625..59375) C [.59375,.625)

rithmetic Coding Subdivide again: (range =.325, low =.59375, high =.625): Symbol Range C [.59375,.6937) C [.6937..67875) CC [.67875,.625) So the (unique) output code for C is any number in the range: [.59375,.6937) This number is referred to as a tag.

rithmetic Coding..75.5 C=.25 =.25.75.6875.625 C=.25 =.25.625.59375.5625 C=.25 =.25.625.67875.6937 C=.25 =.25 =.5 =.5 =.5 =.5..5.5.59375 Sym Range Sym Range Sym Range Sym Range [.,.5) [.5,.625) [.5,.5625) C [.59375,.6937) [.5..75) [.625..6875) [.5625..59375) C [.6937..67875) C [.75,.) C [.6875,.75) C [.59375,.625) CC [.67875,.625)

rithmetic Coding Suppose the alphabet is [,,C, D, E, F, $] with known probability distribution. $ is a special symbol used to terminate the message We want to encode a of symbols CEE$

rithmetic Coding Suppose the alphabet is [,,C, D, E, F, $] with known probability distribution. We want to encode a of symbols CEE$ low= low + range Range_low(sym); high= low + range Range_high(sym);

rithmetic Coding EGIN low =.; high =.; range =.; initialize symbol; while (symbol $= terminator) { get (symbol); low = low + range * Range_low(symbol); high = low + range * Range_high(symbol); range = high - low; } output a code so that low <= code < high; END [.3384,.3322) range = P C P P E P E P $ =.2.2.3.3. =.36

rithmetic Coding inary fractional Decimal..5..25..25..625..33..56..78..39. treat the whole message as one unit 2 2 + 2 4 + 2 6 + 2 8 =.332325 [.3384,.3322) range = P C P P E P E P $ =.2.2.3.3. =.36

rithmetic Coding In the worst case, the shortest codeword in arithmetic coding will require k bits to encode a sequence of symbols: = rithmetic coding achieves better performance than Huffman coding but it has some limitations: long sequences of symbols: a very small range. It requires very high-precision numbers The encoder will not produce any output codeword until the entire sequence is entered. =

inary rithmetic Coding inary rithmetic Coding deals with two symbols only, and and uses binary fractions. Idea: Suppose alphabet was X, Y and consider stream: XXY Therefore: P(X) = 2/3, P(Y) = /3 For encoding length 2 messages, we can map all possible messages to intervals in the range [... ):

inary rithmetic Coding To encode message, just send enough bits of a binary fraction that uniquely specifies the interval. Message Codeword. X XX Y XY YX YY 4/9 6/9 8/9 2/4 3/4 5/6...

inary rithmetic Coding Similarly, we can map all possible length 3 messages to intervals range [... ) -log 2 p bits to represent interval of size p. -Log 2 (/27)=4.7549 5

Lempel-Ziv-Welch (LZW) lgorithm very common compression technique. Used in GIF files (LZW), dobe PDF file (LZW), Patented: LZW Patent expired in 23/24. asic idea/example Suppose we want to encode the Oxford Concise English dictionary which contains about 59, entries. 59 = 8 Why not just transmit each word as an 8 bit number?

Lempel-Ziv-Welch (LZW) lgorithm Problem Too many bits per word Everyone needs a dictionary to decode back to English. Only works for English text. Solution Find a way to build the dictionary adaptively. Original methods (LZ) due to Lempel and Ziv in 977. Terry Welch improvement (984), Patented LZW lgorithm LZW idea is that only the initial dictionary needs to be transmitted to enable decoding: The decoder is able to build the rest of the table from the encoded sequence.

Lempel-Ziv-Welch (LZW) lgorithm EGIN s = next input character; while not EOF { c = next input character; if s + c exists in the dictionary s = s + c; else { output the code for s; add s + c to the dictionary with a new code; s = c; } } output the code for s; END

Lempel-Ziv-Welch (LZW) lgorithm n example of a stream containing only two alphabets: Let us start with a very simple dictionary ( table) OUTPUT STRING TLE output code representing index

Lempel-Ziv-Welch (LZW) lgorithm s = c = OUTPUT STRING TLE output code representing index 2

Lempel-Ziv-Welch (LZW) lgorithm s = c = OUTPUT STRING TLE output code representing index 2 3

Lempel-Ziv-Welch (LZW) lgorithm s = c = OUTPUT STRING TLE output code representing index 2 3 2 4

Lempel-Ziv-Welch (LZW) lgorithm s = c = OUTPUT STRING TLE output code representing index 2 3 2 4 3 5

Lempel-Ziv-Welch (LZW) lgorithm s = c = OUTPUT STRING TLE output code representing index 2 3 2 4 3 5 6

Lempel-Ziv-Welch (LZW) lgorithm s = c = empty OUTPUT STRING TLE output code representing index 2 3 2 4 3 5 6 6

Lempel-Ziv-Welch (LZW) lgorithm The LZW decompressor creates the same table during decompression. decompress the output sequence of previous example: 2 3 ENCODER OUTPUT 2 codeword STRING TLE

Lempel-Ziv-Welch (LZW) lgorithm The LZW decompressor creates the same table during decompression. decompress the output sequence of previous example: 2 3 ENCODER OUTPUT 2 3 STRING TLE codeword

Lempel-Ziv-Welch (LZW) lgorithm The LZW decompressor creates the same table during decompression. decompress the output sequence of previous example: 2 3 ENCODER OUTPUT 2 3 4 STRING TLE codeword

Lempel-Ziv-Welch (LZW) lgorithm The LZW decompressor creates the same table during decompression. decompress the output sequence of previous example: 2 3 ENCODER OUTPUT 2 3 4 5 STRING TLE codeword