ICSY ICSY ICSY. 1 p. log Compression. How to contact. Multimediale Visualisierungssysteme WS 2000/2001. Acknowledgements.

Size: px

Start display at page:

Download "ICSY ICSY ICSY. 1 p. log Compression. How to contact. Multimediale Visualisierungssysteme WS 2000/2001. Acknowledgements."

Victor Lambert
6 years ago
Views:

1 η = i Multimediale Visualisierungssysteme WS 2/2 p i 5. Compression log 2 p i Prof. Dr. Paul Müller AG: Integrierte Kommunikationssysteme How to contact Prof. Dr. Paul Müller 34 / 32 Tel.: 63 / mueller@uni uni-kl.de Bernd Reuther 32 / 344 Tel.: 63 / reuther@informatik informatik.uni-kl.de Ye Yuan 32 / 346 Tel.: 63 / yuan@informatik informatik.uni-kl.de 2,2 Universität Kaiserslautern, Fachbereich Informatik, AG Acknowledgements Literatur Prof. Dr. Ralf Steinmetz, Stephan G. Eick, Dr. Andreas U. Mauthe, Dr. Peter Thomas, Prof. Dr. Hans Irtel TU-Darmstadt Bell-Labs Fa. TecMath Fa. TecMath Uni Mannheim Guojun Lu and Computing for Distributed Multimedia, Artech House 996 Ralf Steinmetz Multimedia Technologie, Einführung und Grundlagen, Springer Verlag, 993 Borko Furht, Milan Milenkovic A Guided Tour of Multimedia and Applications IEEE Computer Society Press, 995 Universität Kaiserslautern: Dipl.. Inform. Bernd Reuther Dipl.. Inform. Ye Yuan François Fluckinger Understanding Networked Multimedia, applications and technology, Prentice Hall, 995 Andrew S. Tanenbaum Computer Networks, third edition Prentice Hall, 994 2,2 Universität Kaiserslautern, Fachbereich Informatik, AG 2,2 Universität Kaiserslautern, Fachbereich Informatik, AG

Site Map 5. Compression E.g., video sequence 25 images/sec.

Kaiserslautern, Fachbereich Informatik, AG 2,2 Universität Kaiserslautern, Fachbereich Informatik, AG General Requirements Mode Dependent Requirements.

high quality Dialogue and retrieval mode requirements: Independency of frame size and video frame rate Synchronization of audio, video, and other media

2 Site Map 5. Compression E.g., video sequence 25 images/sec. 3 Byte/Pixel Image resolution 64 * 48 Pixel Data rate = 64 * 48 * 3 Byte * 25/s = 234 byte/s ~ 22 MByte/s Compression is required 2,2 Universität Kaiserslautern, Fachbereich Informatik, AG 2,2 Universität Kaiserslautern, Fachbereich Informatik, AG General Requirements Mode Dependent Requirements. low delay Compression 3. high quality Dialogue and retrieval mode requirements: Independency of frame size and video frame rate Synchronization of audio, video, and other media Dialogue mode requirements: Compression and decompression in real-time 2. low complexity efficient complementation End-to-end delay < 5ms Dependency on application type: Dialogue mode Retrieval mode Retrieval mode requirements: Fast forward and backward data retrieval Random access within /2 s 2,2 Universität Kaiserslautern, Fachbereich Informatik, AG 2,2 Universität Kaiserslautern, Fachbereich Informatik, AG

3 Categories and Techniques Entropy Coding Entropy Coding Source Coding Hybrid Coding Run-Length Coding Lempel - Ziv Huffman Coding, Arithmetic Coding,... Adaptive Coding Prediction Transformation Layered Coding Vector Quantization JPEG MPEG H.26 DVI RTV, DVI PLV,... DPCM DM FFT DCT Bit Position Subsampling Sub-Band Coding Entropy is the average information content of an information Source S: HS ( ) = η = p i log i p i p i is the probability for the occurrence of S i in S log indicates the number of bits needed to code S p i i Examples: a picture with a uniform distribution of gray-scale values: p i = /4; log 2 / p i = 2; H(S) = 2 a picture with a non uniform distribution of gray-scale values: p i = /8 for i=,..,2; p 4 = 5/8; log 2 / p i = 3 for i=,..,2; log 2 / p 4,67872; H (S), ,2 Universität Kaiserslautern, Fachbereich Informatik, AG 2,2 Universität Kaiserslautern, Fachbereich Informatik, AG The braile Entropy Coding Example: Letter statistic compared to Morse Code: Letter probability probability Morse Code German English e 6,65% 2,4% n,36% 6,4% i 8,4% 6,46% t 5,43% 8,9% a 5,5% 8,9% o 2,25% 8,3% x,3%,2% y,3% 2,4% 2,2 Universität Kaiserslautern, Fachbereich Informatik, AG 2,2 Universität Kaiserslautern, Fachbereich Informatik, AG

4 Run-length Coding Lempel - Ziv (LZ77) Run-length coding compresses: same successive symbols Example: Coding a b b c d e e e e e e e a x b c a b b c d x 7 e a x x b c x marks the beginning of a coded sequence special coding is required to code x as a symbol, here x x is used. How to code a bit stream? Zero suppression is a special form of run-length coding Algorithm for compression of character sequences: assumption: sequences of characters are repeated idea: replace a character sequence by a reference to an earlier occurence. Define a search buffer = already encoded data look ahead buffer = not yet encoded data 2. Find the longest match between the first characters of the look ahead buffer and an arbitrary character sequence in the search buffer 3. Produce output <offset, length, next_character> offset + length = reference to earlier occurence next_character = the first character following the match in the look ahead buffer 2,2 Universität Kaiserslautern, Fachbereich Informatik, AG 2,2 Universität Kaiserslautern, Fachbereich Informatik, AG Lempel - Ziv (LZ77) Lempel - Ziv (LZ77) Example: Pos Char A A B C B B A B C Step Pos Match Char Output - A <,,A> 2 2 A B <,,B> C <,,C> 4 5 B B <2,,B> 5 7 AB C <5,2,C> Remarks: the search and look ahead buffer have a limited size the bits needed to encode pointers and length information depends on the buffer sizes worst case: the character sequences are longer than one of the buffers typical size are 4-64 KB sometimes other representation of the triple are used next_char only if necessary (i.e. no match found) enabling dynamic change of buffer sizes LZ77 or variants are often used before entropy coding LZ77 + Huffmann coding are used by gzip and for PNG graphics gif uses only LZW (a variant of LZ77) 2,2 Universität Kaiserslautern, Fachbereich Informatik, AG 2,2 Universität Kaiserslautern, Fachbereich Informatik, AG

5 Huffman Coding Huffman Coding Use short bit patterns for frequently used symbols. Sort all symbols by probability 2. Get the two symbols with the lowest probability, remove them from the list and insert a parent symbol to the sorted list where the t probability is the sum of both symbols 3. If there are at least two elements left continue with step 2 4. Assign and to the branches of the tree Example: : 38 bits are used for regular coding Symbol Count regular E 35 A 2 D B 5 C 45 P(95) B(5) C(45) Symbol Count E 35 A 2 D P 95 2,2 Universität Kaiserslautern, Fachbereich Informatik, AG 2,2 Universität Kaiserslautern, Fachbereich Informatik, AG Huffman Coding Huffman Coding 2 P2(25) D() P(95) B(5) C(45) 3 E(35) P3(255) A(2) P2(25) D() P(95) B(5) C(45) Symbol Count P2 25 E 35 A 2 Symbol Count P3 255 P2 25 2,2 Universität Kaiserslautern, Fachbereich Informatik, AG 2,2 Universität Kaiserslautern, Fachbereich Informatik, AG

6 Huffman Coding Adaptive Algorithms 4 5 bits are used for huffman coding P4(46) P3(255) P2(25) E(35) A(2) D() P(95) Interactive Example Previous algorithm requires statistical knowledge which is often not available (e.g. live audio, video) Even when statistical knowledge is available, it could be much overhead to sent large coding. B(5) C(45) Symbol Count huffman E 35 A 2 D B 5 C 45 Solution - Use adaptive algorithms, i.e. Adaptive Huffman Coding 2,2 Universität Kaiserslautern, Fachbereich Informatik, AG 2,2 Universität Kaiserslautern, Fachbereich Informatik, AG Adaptive Huffman Coding Adaptive Huffman Coding Encoder Initialize_model(); while (( c = getc(input))!= eof) { encode (c, output); update_model(c); } Decoder Initialize_model(); while (( c = decode(input))!= eof) { putc (c, output); update_model(c); } Example n-th Huffman tree 9. W=7 Important: encoder and decoder have to use exactly the same initialization and update_model routines update_model does two things: increment the count 2 update the resulting Huffman tree during the updates, the Huffman tree will be maintained its sibling property, i.e. the nodes are arranged in order of increasing weights when swapping is necessary, the farthest node with weight W is swapped with the node whose weight has just been increased to W+ (If the node with the weight W has a subtree beneath it, then the subtree will go with it) 7. W=7 8. W= (E) 5. W=3 6. W=4.A-W= 2.B-W=2 3.C-W=2 4.D-W=2 resulting code for A: 2,2 Universität Kaiserslautern, Fachbereich Informatik, AG 2,2 Universität Kaiserslautern, Fachbereich Informatik, AG

7 Adaptive Huffman Coding Adaptive Huffman Coding n+2-th Huffman tree A was incremented twice node A and D swapped.d-w=2 5. W=4 new code for A: 7. W=9 2.B-W=2 9. W=9 6. W=5 3.C-W=2 8. W= (E) 4.A-W=3 A was incremented twice.d-w=2 5. W=4 7. W= 2.B-W=2 6. W=7 The 4th (A) and the 5th node have to swap. 9. W=2 3.C-W=2 8. W= (E) 4.A-W=5 2,2 Universität Kaiserslautern, Fachbereich Informatik, AG 2,2 Universität Kaiserslautern, Fachbereich Informatik, AG Adaptive Huffman Coding Adaptive Huffman Coding resulting tree after st swap 7. W= 9. W=2 8. W= (E) n+4-th Huffman tree 7. W= (E) 9. W=2 8. W= 5.A-W=5 6. W=6 5.A-W=5 6. W=6 3.C-W=2 4. W=4.D-W=2 2.B-W=2 8th (E) and the 7th node have to swap new code for A: 3.C-W=2 4. W=4.D-W=2 2.B-W=2 2,2 Universität Kaiserslautern, Fachbereich Informatik, AG 2,2 Universität Kaiserslautern, Fachbereich Informatik, AG

Huffman Coding Valuation Arithmetic Coding Advantages The huffman codes are optimal according to information theory The Algorithm is simple Disadvantage Huffman is designed to code single characters

a word of 8 character requires at least an 8 bit code Not suitable for strings of different length or changing probabilities for charaters in a different context respectivly Examples for both

sc then p( h ) > p( e ) is valid (nonsatisfying) solutions for both scenarios Solution: Use a special coding where sch is one character only, but this requires knowledge about frequent character

8 Huffman Coding Valuation Arithmetic Coding Advantages The huffman codes are optimal according to information theory The Algorithm is simple Disadvantage Huffman is designed to code single characters only. Therefore at least one bit is required per character, e.g. a word of 8 character requires at least an 8 bit code Not suitable for strings of different length or changing probabilities for charaters in a different context respectivly Examples for both interpretations of that problem: Huffman coding does not support different propabilities for: c h s and sch For a usual german text p( e ) > p( h ) is valid, but if the preceding characters have been sc then p( h ) > p( e ) is valid (nonsatisfying) solutions for both scenarios Solution: Use a special coding where sch is one character only, but this requires knowledge about frequent character combinations in advance Solution: use different huffman codes with respect to the context, but this leads to large code tables which must be appended to the coded data Arithmetic coding generates codes that are optimal according to information theory and supports strings of different length The probalistic model is separated from the encoding operation: Probalistic model: assign a codeword to each possible string. The codewords consist of half open subintervals of the interval [,) Encoding: a given subinterval can be uniquely identified by any value of that interval. Use the value with the shortest binary representation to identify the subinterval In practice, the subinterval is refined incrementally using the probabilities of the individual events or of event sequences. If strings of different length are used then extra information is needed to detect the end of a string 2,2 Universität Kaiserslautern, Fachbereich Informatik, AG 2,2 Universität Kaiserslautern, Fachbereich Informatik, AG Arithmetic Encoding Example Arithmetic Decoding Example probailities Termination rules: a and! are termination symbols Strings starting with b have a length of max 4. Decode: Bit Output - b - b a - a - b bb b 2,2 Universität Kaiserslautern, Fachbereich Informatik, AG 2,2 Universität Kaiserslautern, Fachbereich Informatik, AG

Source Coding: images Transformation and Quantization Take into account the specific data characteristics and the human sensitiveness to that type of data.

9 Source Coding: images Transformation and Quantization Take into account the specific data characteristics and the human sensitiveness to that type of data. Example: properties of human vision Luminance: is the overall response to the total light energy, it depends on energy and wavelength of the light. The human eye is much more sensitive to changes of the luminance than for changes of the wavelength Lightness: is the whiteness of a specific object An object reflecting less than 3% of the incident light is perceived as black An object reflecting more than 8% of the incident light is perceived as white Light of different wavelength do not just appear as different colours The human eye is more sensitive to certain wavelength than to others, our eye is much more sensitive to yellow or yellowgreen then to red or violet. The RGB image representation does not allow direct access to all image attriburtes. Transform the image into more appropiate representations: To color spaces that separate luminance and color YUV or YIQ LAB To frequency space that represent changes within the picture Discrete Cosine Transformation (DCT) Wavelet decomposition In the new representation the image attributes are quantizised to apropiate values, i.e. image information is reduced. 2,2 Universität Kaiserslautern, Fachbereich Informatik, AG 2,2 Universität Kaiserslautern, Fachbereich Informatik, AG Discrete Cosine Transformation (DCT) Discrete Cosine Transformation (DCT) Typical pictures have minimal changes in colour or luminance between two adjacent pixels A frequency representation describes the amount of variation DCT is a transformation between absolute values and a frequency representation DCT can be used for source coding Basis pictures of the one dimensional DCT: Basis pictures of the two dimensional DCT: 2,2 Universität Kaiserslautern, Fachbereich Informatik, AG 2,2 Universität Kaiserslautern, Fachbereich Informatik, AG

DCT + Quantization Discrete Cosine Transformation (DCT) Assumption: changes of colour and luminance between adjacent pixels is low DCT is applicable for photos DCT is not applicable for vector

10 DCT + Quantization Discrete Cosine Transformation (DCT) Assumption: changes of colour and luminance between adjacent pixels is low DCT is applicable for photos DCT is not applicable for vector graphics or two coloured pictures (black text, white background) DCT is a loss less technique: information is lost because of precision limits The result of DCT typically is quantized: a) The original image representation. b) Result of the DCT, nearly all information of the picture is represented with low frequency, the rest is nearly null. c) Result of the quantization, the precision of the values has been reduced, most values are exactly null now. This is the Result of the source coding. d) Result of the decompression process. coefficients are divided by defined values to reduce precision two approaches: use the same divisor for the whole matrix use a special divisor for each matrix position, this allows to weight the coefficients. Typically use small divisors for the low frequencies and use higher divisors for high frequencies. During quantisation information is lost, the divisor defines the amount of lost information. 2,2 Universität Kaiserslautern, Fachbereich Informatik, AG 2,2 Universität Kaiserslautern, Fachbereich Informatik, AG Hybrid Coding Steps for Images, Video and Audio Basic Encoding Steps: source image Video: lossy Audio: lossless e.g.: -resolution - frame rate lossy sometimes lossless Quantization preparation Transformation e.g.: - DCT - Subband coding compressed e.g.: - Linear - DC, AC values lossless Entropy encoding e.g.: - run length - Huffman image 2,2 Universität Kaiserslautern, Fachbereich Informatik, AG Image preparation The colour representation may take into account human sensitiveness Example: RGB YUV R G B PAL-Norm color representation: Y =,3 R +,59 G +, B U =,493 (B Y) = -, R -,29G +,44 B V =,877 (R Y) =,62 R,52 G, B The precision of luminance (Y) may be higher than precision of chrominance (U + V) this image preparation may also include simple source coding Y luminance U V chrominance 2,2 Universität Kaiserslautern, Fachbereich Informatik, AG

Color sampling ratio for YUV Color sampling example a:b:c notation for sampling ratio according to ITU-R R (former CCIR) recommendation 6-2: Encoding parameters of digital television for studios

11 Color sampling ratio for YUV Color sampling example a:b:c notation for sampling ratio according to ITU-R R (former CCIR) recommendation 6-2: Encoding parameters of digital television for studios 4:4:4 no loss of information 4:2:2 2: horizontal downsampling, no vertical downsampling (= 4 Y samples for every 2 U and 2 V samples per scanline ) 4:: 4:2: 4: horizontal downsampling, no vertical downsampling (= 4 Y samples for every U and V samples per scanline ) 2: horizontal downsampling, 2: vertical downsampling (= 4 Y samples for every 2 U or 2 V samples per scanline, U and V are processed on alternating scanlines only) 2,2 Universität Kaiserslautern, Fachbereich Informatik, AG Interactive Example 2,2 Universität Kaiserslautern, Fachbereich Informatik, AG Hybrid Coding: JPEG Quantization International Standard JPEG = Joint Photographics Expert Group Coding/Decoding of continuous-tone still images Image preparation and coding: R G B Input data * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * Zig-Zag Scan Entropy coding Y U V Image prepreparation Quantization (e.g. div by 6) planes to n* 8x8 blocks DCT for each block Typically realized as reduction of number precision Quantization leads to loss of information division by Q-factor: Q fixed Q-factor for the whole picture fixed Q-factor for a block, variable between blocks (used if predefined limit of data must not be exceeded) spectral quantization: variable Q-factors within a block, table of Q-factors is fixed for the whole picture low Q-factors for important DCT coefficient and vice versa different tables of Q-factors for the Y and the U V planes JPEG does not define Q-factors, Q but there are recommend Q-factor tables for spectral quantization 2,2 Universität Kaiserslautern, Fachbereich Informatik, AG 2,2 Universität Kaiserslautern, Fachbereich Informatik, AG

12 JPEG Example JPEG Examples Interactive examples: JPEG PNG Size: 3.64 bytes Size: 8 bytes Size: bytes Size: 9.47 bytes 2,2 Universität Kaiserslautern, Fachbereich Informatik, AG 2,2 Universität Kaiserslautern, Fachbereich Informatik, AG JPEG Summary Wavelet image compression Different resolution of individual components possible Image preparation using YUV representation Loss less / lossy modes loss of information by DCT is minimal loss of information by quantisation may be significant, but amount of lost data is adjustable quantisation: the JPEG standard defines two default quantization tables, one for luminance and one for chrominance Like DCT the wavelet image compression divide images in different wave bands DCT frequencies are represented by cosines waves applied to 8x8 samples only to reduce complexity Wavelet (= a small wave) frequencies are represented by wavelets entropy coding: first step: run-length coding, the zig-zag scan leads to a sequence of zeros at the end second step: huffman or arithmetic coding applied to high frequencies only to reduce complexity in a recursive process low frequencies are doubled and encoded like higher frequencies ( = multi scale analysis) 2,2 Universität Kaiserslautern, Fachbereich Informatik, AG 2,2 Universität Kaiserslautern, Fachbereich Informatik, AG

Wavelet image compression Wavelet examples Mexican Hat Wavelet Haar Wavelet ~ ~ H ( x ) is low pass filter, G( x) is high pass filter (defined by Wavelet) et) Filters are applied to rows and columns

13 Wavelet image compression Wavelet examples Mexican Hat Wavelet Haar Wavelet ~ ~ H ( x ) is low pass filter, G( x) is high pass filter (defined by Wavelet) et) Filters are applied to rows and columns Restart process with ( x, y) optionally also for the other images f LL 2 w( x) = ( x ) e 2 x / 2 if x <.5 w( x) = if.5 x < otherwise Wavelets should be simple w( x) only for x "near" the local change of color can be described by a relocated and deformed wavelet 2,2 Universität Kaiserslautern, Fachbereich Informatik, AG 2,2 Universität Kaiserslautern, Fachbereich Informatik, AG Wavelet compression Example + Info Fractal image compression Decompression video example (original from University of Bristol, UK Idea: compression of images by using self similarities Describe image fragments as modified other image fragments scaling, moving, rotating or flipping More information: Amaras Wavelet page Software: WaveConvert / sawave (Uni-Rostock) Example modification of contrast and brightness Mathematical background: Result from Banachs Fixpoint theorem: the fractal can be constructed by iterating the same process the image is the fixpoint fractal compression means find a function converging to that fixpoint 2,2 Universität Kaiserslautern, Fachbereich Informatik, AG 2,2 Universität Kaiserslautern, Fachbereich Informatik, AG

Fractal image compression Fractal compression Examples + Info Fractal image compression in general is very complex Method to reduce complexity: recursive partition of the image in rectangles

$edu/y/fractals/) the quality of the resulting image can t be guaranteed for a given number of iterations Software: fracomp (http://www-vs.informatik.uni-ulm.de/mitarbeiter/kassler/fractals.$

14 Fractal image compression Fractal compression Examples + Info Fractal image compression in general is very complex Method to reduce complexity: recursive partition of the image in rectangles containing few information Iteration steps of decompression describe one rectangle as a modification of another Remarks: no pixel values are stored, the compression results are iterative functions describing the image More information: Fractal Image Encoding (Yuval Fisher) ( the quality of the resulting image can t be guaranteed for a given number of iterations Software: fracomp ( Fractal Imager ( 2,2 Universität Kaiserslautern, Fachbereich Informatik, AG 2,2 Universität Kaiserslautern, Fachbereich Informatik, AG MPEG MPEG video formats International standard of ISO 993 MPEG = Moving Pictures Expert Group Today the first standard is often called MPEG- Definition of Compression algorithms Formats for storing or transmitting compressed media Includes Video and Audio coding Properties Asymetric codec, the encoder is more complex than the decoder The resulting bitrate is limited.5 Mbit/s for video 448 kbits/ for audio Image preparation Separation of luminance and chrominance (similar to YUV using 4:2:) More attributes required than for JPEG Relation of width an hight of the pixels, to support different TV formats, example: 4:3 for PAL 6:9 for HDTV (Europe and USA) : Picture frequency 25 Hz or 3 Hz for interlaced PAL or NTSC 5 Hz and 6 Hz for non-interlaced Video 23,976 Hz is the lowest possible frequency 2,2 Universität Kaiserslautern, Fachbereich Informatik, AG 2,2 Universität Kaiserslautern, Fachbereich Informatik, AG

15 MPEG layered video stream MPEG video compression. Sequence Layer - Controls buffering of video data, each sequence starts with the constant bitrate and amount of memory required for the following sequence 2. Group of Pictures Layer Layer - A set of pictures where at least one (usually the first) is encoded without references to other pictures (I-frame). The order of pictures within this layer may be different from the presentation order. During image preparation the chrominace precision is reduced 2. Each 8x8 block is transformed using the DCT, quantizised,, run-length and huffman encoded 3. Picture Layer - Contains one picture 4. Slice Layer - A set of macroblocks using the same DCT-Quatization scaling 3. Temporal correlations between video frames are used to reduce the size of 6x6 pixel makro blocks 5. Macroblock Layer - A block of 6x6 pixel build of four 8x8 blocks of the luminace plane and of two 8x8 blocks of the chrominance planes (one block of each chrominance plane) 6. Block Layer Layer - One 8x8 block compressed using DCT Frame # Frame #2 Frame #3 Parts of frame # which change only slightly, may be encoded as references to frame # in frame #2 and frame #3 Moving vectors are references to a 6x6 pixel area in prior or following frames describing temporal corelations 2,2 Universität Kaiserslautern, Fachbereich Informatik, AG 2,2 Universität Kaiserslautern, Fachbereich Informatik, AG Using Moving Vectors MPEG frames types While encoding a makro block the encoders seeks for a similar 6x6 area in other frames. If such an area has been found the macro block is encoded by: The moving vector, describing the frame and the position of the similar area Calculate the difference of the actual 6x6 pixel and the similar area (using luminance and chrominance planes) Applying the DCT + quantization + entropy encoding to the difference only. A special value is used to describe a complete empty 8x8 block MPEG defines how to describe a moving vector only MPEG does not define an algortihm for finding such a vector good encoders find good matches in prior or following frames medium encoders find some matches in other frames bad encoders do not even try to find moving vectors Usually only the lumiance values are used to find similar areas two adjacent makro blocks often have similar moving vectors Several frame types with different temporal correlations: full frames are transmitted periodically (I-Frames) frames that depend only on preceding frames (P-Frames) are used to simplify the calculation of B-Frames temporal correlations may depend on frames of the past and frames of the future (B-Frames) Sequence of I-, I, P-, P, and B-Frames: B I B B P B References B P I t I-Frames (Intracoded) P-Frames (Predictive Coded) B-Frames (Bidirectionally Coded) D-Frames (DC Coded) 2,2 Universität Kaiserslautern, Fachbereich Informatik, AG 2,2 Universität Kaiserslautern, Fachbereich Informatik, AG

16 MPEG frame types MPEG frame types I-frames Coding without relation to other images Apply DCT to 8 x 8 pixel blocks + spectral quantization + entropy coding ~ JPEG P-frames Coding with relation to prior P or I frames B-frames Coding with relation to prior and following I and P frames Makro blocks of B-frames may be encoded like makro blocks of P-frames or may be described as an interpolation of I or P frame macro blocks and its differences to the macro block to be coded Makro blocks of P-frames may be encoded like makro blocks of I- frames or moving vectors may be used, so that only the difference to a prior frame must be encoded I-frame B-frame P-frame I-frame Moving vector P-frame Difference between the makro block of the I-frame and source image of the P-frame Difference between the interpolation of the P-frame and I- frame makro block and source image of the B-frame Moving vector 2,2 Universität Kaiserslautern, Fachbereich Informatik, AG 2,2 Universität Kaiserslautern, Fachbereich Informatik, AG MPEG frame types MPEG-2 D-frames contain low frequency information only, i.e. the DC coefficient (=, coefficient) only used to support fast forward not necessary when periodic I-frames are used MPEG-2 is an extension to MPEG- for high quality video The standard is flexible Profiles are defined for application classes Different levels of qualities per profile MPEG-2 defines scaling capabilities,, i.e. a decoder can select the required scale Spatial, the horizontal and vertical resolution can be adapted by the receiver Each picture is coded in different sizes, whereby the coding of size n contains only the differences to the image of size n- Rate, the number of frames per second may be defined by the receiver Adequate placing of I-frames and (optionally) D-frames enables the playback with different frame rates 2,2 Universität Kaiserslautern, Fachbereich Informatik, AG 2,2 Universität Kaiserslautern, Fachbereich Informatik, AG

17 MPEG-2 Profiles MPEG-4 Profiles Level attributes High Level 92x52 High-44 Level 44x52 Main Level 72x572 Low Level 352x288 Simple Profile No B- frames 5 Mbit/s Not scalable Main Profile 8 Mbit/s 6 Mbit/s 5 Mbit/s 4 Mbit/s 4:2: SNR scalable Profile SNR scalable 5 Mbit/s 4 Mbit/s B-frames Spatial scalable Profile SNR or spatial scalable 6 Mbit/s High Profile 4:2: or 4:2:2 Mbit/s 8 Mbit/s 2 Mbit/s Work on the standard startet 993 with these requirements: Efficiently represent a number of data types: Video from very low bitrates to very high quality conditions; Music and speech data for a very wide bitrate range Generic dynamic 3-D objects as well as specific objects such as human faces and bodies; Speech and music to be synthesized by the decoder, including support for 3-D audio spaces; Text and graphics; Provide, in the encoding layer, resilience to residual errors for the various data types, for difficult channel conditions such as mobile ones; Independently represent the various objects in the scene, allowing independent access for their manipulation and re-use; Compose audio and visual, natural and synthetic, objects into one audiovisual scene; Describe the objects and the events in the scene; Provide interaction and hyperlinking capabilities; Manage and protect intellectual property on audiovisual content and algorithms, so that only authorized users have access. Provide a delivery media independent representation format, to transparently cross the borders of different delivery environments. 2,2 Universität Kaiserslautern, Fachbereich Informatik, AG 2,2 Universität Kaiserslautern, Fachbereich Informatik, AG MPEG-4 Scene (Example) MPEG-4 Scene Description, Multiplexing & Composition A scene is described by an object tree and coordinates for each AVO Each AVO is encoded by a special CoDec, e.g. wavelet for textures Multiplexing and composition Source: 2,2 Universität Kaiserslautern, Fachbereich Informatik, AG Source: 2,2 Universität Kaiserslautern, Fachbereich Informatik, AG

18 MPEG-4 Remarks H.26 / H.263 The standard is very flexible, it may be extended by new compression techniques Data types Extensions to the standard are descirbed by the MPEG-4 Syntactic Description Language (MSDL) The scene description of MPEG-4 is based on VRML Compression efficiency: Modelling of a scene adds complexitiy to the encoder an decoder Distinction between forground and background will lead to a better compression (compared with MPEG- or 2) in most cases Video compression for video conferences Compression in real time Targeted to ISDN Compressed data stream: p* 64 Kbit/s, p=,..., 3 The data stream structure contains more information than just video: error correction information image sequence number (H.26: 5 bit / H.263: 8 bit) control commands from encoder to decoder start / stop playing video freeze a still image + timeout for automatic restart of video play... Order of appearance and technical influences: H.26 MPEG- H.263 2,2 Universität Kaiserslautern, Fachbereich Informatik, AG 2,2 Universität Kaiserslautern, Fachbereich Informatik, AG H.26 / H.263 image preparation Video input must provide 29,97 images per second YUV representation with color sampling ratio of 4:2: Image sizes: CIF (Common Intermediate Format) 352 x 288 QCIF (Quarter-CIF) 76 x 44 H.263 only: SQCIF (Sub Quarter CIF) 28 x 96 4CIF (4 time CIF) 74 x 576 6CIF (6 times CIF) 48 x 52 Decoders must implement QCIF only the rest is optional Structures: H.26 / H.263 coding block = 8 x 8 pixel of Y U or V plane macro block = 4 blocks of Y plane + block of U and V plane group of blocks (GOB) = 33 macro blocks Intraframe coding: apply DCT to blocks quantization with fixed Q-factors for DC coefficient (= coefficient at position,) for AC coefficients ( = all other coefficients) between macro blocks within a GOB the Q-factor may be changed Interframe coding (more options for H.263): there is a moving vector for each macro block the moving vector may be null otherwise the macro block contains the difference only there is a special code if the difference is null 2,2 Universität Kaiserslautern, Fachbereich Informatik, AG 2,2 Universität Kaiserslautern, Fachbereich Informatik, AG

19 Further Video Compression Standards Audio Compression Digital Video Interactive (DVI): Intel/IBM Technology for digital video 2 Levels of quality: DVI PLV (Production Level Video) (VCR) VHS quality DVI RTV (Real Time Video) worse quality Conventional, lossless compression method, such like Run Length Coding, can be use to compress sound file, but the results depend heavily on the specific sound Some lossy sound compression methods, like silence compression and companding (Compressing/Expanding), which taking the advantage of our perception of sound, can get better quality Silence compression: some small samples are treated as if they are silence(as samples of zero), since some people have less sensitive hearinh Companding: ear requires more presice samples at low amplitudes than high amplitudes. Compandind uses a nonlinear formula to reduce the number of bits per samples 2,2 Universität Kaiserslautern, Fachbereich Informatik, AG 2,2 Universität Kaiserslautern, Fachbereich Informatik, AG µ-law and A-LawA Audio Compression µ-law and A-Law: encoder International standard, logarithm-based, to encode digitized audio sample for ISDN digital telephony services Experiments indicating that the low amplitudes of speech signals containing more information than the high amplitudes Encoder receiving 4-bit input sample x, normalizing it to x, its is within the interval [-,+], G.7 standard recommends µ = 255 P µ-law Codeword Format S2 E.g. : 656 S S Q3 Q2 Q Q ln( + µ x) x = sgn( x), ln( + µ ) +, for x > wheresgn( x) =, for x =, for x < Since logarithms is complex, so in practice the encoder performs much simpler calculations that produce an approximation 2 9 Q3 8 Q2 7 Q 6 Q 5. Adding 33 to the absolute value of the input sample 2. Determining the bit position of the significant -bit among bits 5 through 2 of the input 3. Subtracting 5 from that position -> Segment Code 4. The 4-bits quantization code is set of the four bits following the bit position determined in step ,2 Universität Kaiserslautern, Fachbereich Informatik, AG 2,2 Universität Kaiserslautern, Fachbereich Informatik, AG

20 Audio Compression MPEG Audio P S2 S S Q3 µ-law and A-Law: decoder. Multiply the quantization code by 2 and add 33 ->5, so 5 x = Multiply the result by 2 raised to the power of the segment code 4 ->4, so 43 x 2 = Decrement the result by = Use bit P to determine the sign ->-655 Q2 Q Q MPEG-: (992, ISO/IEC 72-3) signal-channel (mono) and two-channel (stereo) coding of digitized sound signal used for high quality audio at 32, 44. and 48 KHz sampling rate 3 coding methods are defined: Layer-, Layer-2, Layer-3 Predefined bit rates range from 32 to 448 kbit/s for layer-, from 32 to 384 kbit/s for Layer-2, and from 32 to 32 kbit/s for Layer-3 MPEG-2 BC: (994, ISO/IEC 388-3) A backwards compatible multichannel extension to MPEG-, up to 5 main channels plus a low frequent enhancement channel (5. channels) can be coded Low sample rate extension, sampling frequencies at 6, 22.5 and 24 KHz Lower bit rate extension, bit rates down from 32 to 256 kbit/s for Layer-, and from 8 to 6 kbit/s for Layer-2 and Layer-3 2,2 Universität Kaiserslautern, Fachbereich Informatik, AG 2,2 Universität Kaiserslautern, Fachbereich Informatik, AG MPEG Audio Layer MPEG-2 AAC (Advanced Audio Coding): (997, ISO/IEC 388-7) A high quality audio coding standard for to 48 channels at sampling rates of 8 to 96 KHz Not backward compatible, cannot be read and interpreted by an MPEG- audio encoder MPEG-4: (999, ISO/IEC ) Object-based coding Both in MPEG- and MPEG-2 BC are three layers defined. These Layers represent a family of coding algorithms. Basic model is the same, but codec complexity, performance and delay increase with each layer Layer- Compression ratio 4: Delay 9 to 5 ms Application Digital compact Cassette(DCC) 384kbit/s,Stereo Layer-2 6: to 8: 35 to ms DVD Layer-3 (MP3) : to 2: 59 to 5 ms Digital audio Broadcast kbit/s 256kbit/s Internet Audio 2-28kbit/s 28kbit/s 2,2 Universität Kaiserslautern, Fachbereich Informatik, AG 2,2 Universität Kaiserslautern, Fachbereich Informatik, AG

21 Psychoacoustic Psychoacoustic 2 MPEG is modern perceptual audio coding techniques exploit the psychoacoustic principles. The term psychoacoustic describes the characteristic of the human auditory system Sensitivity of human hearing Range is about 2 Hz to 2 KHz, most sensitive at 2 to 4 KHZ Spectral masking effect Critical band: narrow at low audible frequencies, wide at high frequencies Temporal masking effect Making effect before and after a strong sound Pre masking is about 2 to 5 ms, the postmasking can be up till ms 2,2 Universität Kaiserslautern, Fachbereich Informatik, AG 2,2 Universität Kaiserslautern, Fachbereich Informatik, AG MPEG Audio encoder Format of Compressed Data Layer- The input audio stream passes through a filter bank that divides the input into 32 subbands of frequency (the audio samples are transformed from time domain to frequency domain). The subband samples are packeged into frames containing 384 samples in Layer- (2 samples/subband * 32) and 52 samples in Layer-2/3 (36 * 32) The input audio stream simultaneously passes through a psychoacoustic model that determines the masking threshold (SMR, Signal-to-Mask Ratio) for each subband Header (32) CRC (,6) Bit Allocation (28-256) 256) Scalefactors (-384) Samples Ancillary data The header of each frame contains general information such as the synchronization information, MPEG layer, the sampling frequency, the number of channels, the bit-rate and coding mode Bit allocation describes the number of bits per sample each subband (samples in different subbands could be represent with different number of bits), for Lyer- this allocation can be to 5 bits per suband The 2 signals of each subband are scaled such that the largest one becomes very close to one but not greater than one, each scale factor has 6 bits 2,2 Universität Kaiserslautern, Fachbereich Informatik, AG 2,2 Universität Kaiserslautern, Fachbereich Informatik, AG

22 Format of Compressed Data Layer-2 Format of Compressed Data Layer-3 Header (32) CRC (,6) Bit Allocation (26-88) SCSFI (-6) Scale- factors (-384) Samples 52 samples in Layer-2/3 (36 * 32) in each frame A frame is divided into 3 parts, each resembles a Layer- frame SCFSI (scale factor selection information) has 2 bits, it indicates whether, 2 or 3 sacle factors per subband are written in the frame Ancillary data Header Side Info Frame For Layer-3 is variable bit rate allowed Encoder can borrow bits donated from past frames Side information includes a 9 bits point, main_data_begin, Header 32 2 Herder Side Info Frame 2 CRC (,6) Side information (36,256) Header Side Info Frame 3 4 samples Header Side Info Frame ,2 Universität Kaiserslautern, Fachbereich Informatik, AG 2,2 Universität Kaiserslautern, Fachbereich Informatik, AG Bit Allocation Bit Allocation The psychoacoustic model and the bit allocation algorithm are invoked to determine the number of bits allocated to the quantization of each scaled sample in each subband The bit allocation algorithm works in this way, that the quantization noise (difference between the original spectral values and the quantized spectral values) is below the masking threshold, while using not more than the available bits over a frame. Since the sample rate, samples per frame and bit rate are known, so the available bits per frame is fixed For the most efficient compression, each subband should be allocated no more bits than necessary to make the quatization noise inaudible MNR = SNR SMR MNR: Mask-to-Noise Ratio SNR: Signal-to-Noise Ratio, (defined in standard,depending on the number of bits per sample) SMR: Signal-to-Mask Ratio, (computed in psychoacoustic model) For Layer- and Layer-2 : The process repeats until no more code bits can be allocated or until all the subbands have reached their maximum limit For Layer-3 : Bit Allocation is similar to Layer-/2, adds noise allocation The process stops if: All scale-factor bands have the allowed noise or less The next iteration would cause the amplification for any of the bands to exceed the maximum allowed value The next iteration would require a requantization of all the bands The Layer-3 uses Nonuniform quantization Uses Huffmann Coding 2,2 Universität Kaiserslautern, Fachbereich Informatik, AG 2,2 Universität Kaiserslautern, Fachbereich Informatik, AG

23 MPEG-2 2 AAC (Advanced Audio Coding) MPEG-2 AAC is the state of the art audio coding scheme for generic coding of stereo multichannel signals, it gives up backwards compatibility to MPEG- AAC follows the same basic coding paradigm as Layer-3 (high frequency resolution filter bank, non-uniform quantization, Huffman coding, iteration loop structure), but improves on Layer-3 in a lot of details and uses new coding tools for improved quality at low bit-rates AAC on the contrary standards only the format of the encoded audio data. The only audio coding scheme used within the MPEG-4 standard Application Digital broadcasting system Audio via Internet 2,2 Universität Kaiserslautern, Fachbereich Informatik, AG

Multimediale Visualisierungssysteme WS 2000/2001

Multimediale Visualisierungssysteme WS 2000/2001 5. Compression η = i p i log 2 1 p i Prof. Dr. Paul Müller AG: Integrierte Kommunikationssysteme How to contact Prof. Dr. Paul Müller 34 / 312 Tel.: 0631