MPEG-2 VIDEO COMPRESSION TECHNIQUE

MPEG-2 VIDEO COMPRESSION TECHNIQUE By PRATEEK RAJ GAUTAM HBTI KANPUR

Abstract MPEG-2 is an extension of the MPEG-1 nternational standard for digital compression of audio and video signals.mpeg-1 was designed to code progressively scanned video at bit rates up to about 1.5 Mbit/s for applications such as CD-I (compact disc interactive). MPEG-2 is directed at broadcast formats at higher data rates; it provides extra algorithmic tools' for efficiently coding interlaced video supports a wide range of bit rates and provides for multichannel surround sound coding. This paper introduces the principles used for compressing video according to the MPEG-2 standard, and outlines the comression techniques.

Introduction The MPEG-2 committee began its life in late 1988 by the hand of Leonardo Chairiglione and Hiroshi Yasuda with the immediate goal of standardizing video and audio for compact discs.over the next few years, participation amassed from international technical experts in the areas of Video, Audio, and Systems, reaching over 200 participants by 1992. By the end of the third year (1990), a syntax emerged, which when applied to code SIF video and compact disc audio samples rates at a combined coded bitrate of 1.5 Mbit/sec, approximated the perceptual quality of consumer video tape (VHS). After demonstrations proved that the syntax was generic enough to be applied to bit rates and sample rates far higher than the original primary target application, a second phase (MPEG-2) was initiated within the committee to define a syntax for efficient representation of broadcast video. Efficient representation of interlaced (broadcast) video signals was more challenging than the progressive (non-interlaced) signals coded by MPEG-1. Similarly, MPEG-1 audio was capable of only directly representing two channels of sound. MPEG-2 would introduce a scheme to decorrelate mutlichannel discrete surround sound audio. Need for a third phase (MPEG-3) was anticipated in 1991 for High Definition Television, although it was later discovered by late 1992 and 1993 that the MPEG-2 syntax simply scaled with the bit rate, obviating the third phase. MPEG-4 was launched in late 1992 to explore the requirements of a more diverse set of applications, while finding a more efficient means of coding low bit rate/low sample rate video. Today,MPEG(video and systems) is exclusiv syntax of the United States Grand Alliance HDTV specification, the European Digital Video Broadcasting Group, and the high density compact disc (lead by rivals Sony/Philips and Toshiba). The MPEG (Moving Pictures Experts Group) MPEG-2 is dismissed by many as inappropriate for digital cinema since it is often viewed at high compression ratio in low bit-rate applications. But MPEG-2 is fundamentally a rich set of compression tools, with capabilities that are not

made available by the commonly defined profiles and levels. By looking deeper than the usual implementations of the standard it is possible to find enhancements to enable the high picture quality required by the digital cinema application. Enhancements in constant quality rate control, color space, and bit depth are possible while still adhering to the basic MPEG-2 bit-stream specification. The enhancements work together with the same silicon devices that are used in larger markets, allowing digital cinema to take advantage of the beneficial price/ performance ratio in the compression and playback systems. Need For Compression:Video actually is a sequence of pictures, each picture is consisted by an array of pixel. For a uncompression video, its size is huge. Such as CCIRR- 601 parameters (720pixels x 480pixels x 30frames/s), it has a data rate at about 165Mbps. This high data rate is too high for user-level application and it is a big problem for CPU and communication. To deal with this problem, video compression is used in order to reduce the size. There are two kinds of compression method, one is loss-less and the other is lossy. For a loss-less compression, such as Huffman, Arithmetic, LZW..etc, they do not work well for video since the distribution of pixel value is wide range Compression Capabilities Of Mpeg-2 :-MPEG 2 provides a way to compress this digital video signal to a manageable bit rate. The compression capability of MPEG-2 video compression is shown in the table-1 followed. Therefore the higher the picture quality for a given Table Summary of compression capabilities Because the MPEG-2 standard provides good compression using standard algorithms, it has become the standard for digital TV. It has the following features Full-screen interlaced and/or progressive video (for TV and Computer displays) Enhanced audio coding (high quality, mono, stereo, and other audio features) Transport multiplexing (combining different MPEG streams in a single transmission stream) Other services (GUI, interaction, encryption, data transmission, etc)

The list of systems which now (or will soon) use MPEG-2 is extensive and continuously growing: digital TV(cable, satellite and terrestrial broadcast), Video on Demand, Digital Versatile Disc (DVD), personal computing, card payment, test and measurement, etc. The MPEG-2 video compression algorithm achieves very high rates of compression by exploiting the redundancy in video information. MPEG-2 removes both the temporal redundancy and spatial redundancy which are present in motion video. Temporal redundancy arises when successive frames of video display images of the same scene. It is common for the content of the scene to remain fixed or to change only slightly between successive frames. Spatial redundancy occurs because pats of the picture (called pels) are often replicated (with minor changes) within a single frame of video. Clearly, it is not always possible to compress every frame of a video clip to the same extent - some parts of a clip may have low spatial redundancy (e.g. complex picture content), while other parts may have low temporal redundancy (e.g. fast moving sequences). The compressed video stream is therefore naturally of variable bit rate, where as transmission links frequently require fixed transmission rates. The key to controlling the transmission rate is to order the compressed data in a buffer in order of decreasing detail. Compression may be performed by selectively discarding some of the information. A minimal impact on overall picture quality can be achieved by throwing away the most detailed information, while preserving the less detailed picture content. This will ensure the overall bit rate is limited while suffering minimal impairment of picture quality. The basic operation of the encoder is shown below:

Basic Operation of an MPEG-2 Encoder MPEG-2 includes a wide range of compression mechanisms. An encoder must therefore which compression mechanisms are best suited to a particular scene / sequence of scenes. In general, the more sophisticated the encoder, the better it is at selecting the most appropriate compression mechanism, and transmission bit rate. MPEG-2 Decoders also come in various types and have varying capabilities (including ability to handle high quality video, ability to cope with errors) and connection options. Block diagram of encoder and decoder:

Most common implementations of MPEG-2 are designed to work with some fixed bandwidth distribution channel. The 19.4 Mb/s payload of the ATSC digital television transmission standard is one example. These implementations apply a constant bit-rate control algorithm to the compression engine, to make sure that every picture can be delivered through the channel at the correct time. This type of rate control necessarily causes the picture quality after compression to vary from scene to scene. In digital cinema, the priority is for consistent picture quality from the first image to the last, before any requirement for fixed bandwidth transmission. Compression for digital cinema should use a variable bit-rate, constant quality mechanism for rate control.

In fact, constant quality rate control is inherent to the basic set of compression tools of MPEG-2, listed in Figure 1. These operations naturally result in complex pictures being allocated more bits, and simple pictures less. The common practice to achieve a constant bit-rate involves adding a layer of control over these tools to monitor compressed picture sizes and adjust quantization for each picture. In compression for digital cinema, this control layer is disabled. The following paragraphs show how the basic MPEG-2 compression tools result in constant quality encoding. The DCT transforms the image data from the spatial domain to the frequency domain. As an example, the block of image data in Figure 2a,b is transformed to the DCT coefficients in Figure 2c. FIGURE 2: Block discrete Cosine transform(a)image block (b)image block with luma represented as height;(c) DCT coefficients.the dc term is in the front corner. At this stage no information from the original image data has been lost; taking the inverse DCT on the coefficient in Figure 2c exactly reproduces the original source data. The DCT coefficients are all signed 11- bit integers except for the dc term which is unsigned up to 11 bits. The advantage of the DCT transform is that most of the coefficients are zero, and many of the rest are small values. In the subsequent variable length coding operation, small values translate to

short codes and zero values are run-length coded. It may seem reasonable to omit quantizing the DCT coefficients altogether and apply the runlength/variable-length codes on the DCT coefficients directly. The result is essentially lossless compression with about 2X compression ratio. Picture Types: The MPEG standard specifically defines three types of pictures: 1 Intra Pictures(I Pictures) 2 Predicted Pictures(P Pictures) 3 BiDirectional Pictures(P Pictures) These three types of pictures are combined to form a group of picture. Intra pictures, or I-Picture, are coded using only information present in the picture itself, and provides potential random access points into the compressed video data. It uses only transform coding and provide moderate compression. Typically it uses about two bits per coded pixel. Predicted Pictures Predicted pictures, or P-pictures, are coded with respect to the nearest previous I- or P-pictures. This technique is called forward prediction and is illustrated in above figure. Like I-pictures, P-pictures also can serve as a prediction reference for B-pictures and future P-pictures. Moreover, P-pictures use motion compensation to provide more compression than is possible with I- pictures.

Bidirectional Pictures Birectional pictures, or B-pictures, are pictures that use both a past and future picture as a reference. This technique is called bidirectional prediction. B-pictures provide the most Cmpression since it use the past and future picture as a regerence, however, the computation time is the largerest. Method of Encoding Pictures Intra Pictures

The MPEG transform coding algorithm includes the following steps: 1.Discete cosinetransform(dct) 2.Quantization 3.Run-length encoding Both image blocks and prediction-error blocks have high spatial redundancy. To reduce this redundancy, the MPEG algorithm transforms 8x8 blocks of pixels or 8x8 blocks of error terms from the spatial domain to the frequency domain with the discrete Cosine Transform(DCT). The combination of DCT and quantisation results in many of the frequency coefficients being zero, especially the coefficients for high spatial frequencies. To take maximum advantage of this, the coefficients are organized in a zigzag order to produce long runs of zero. The coefficients are then converted to a series of run amplitude pairs each pair indicating a number of zero coefficeints and the amplitude of a non-zero coefficient. These run amplitude pairs are then coded with a variable-length code(huffman Encoding) which uses shorter codes for commonly occurring pairs and longer codes for less common pairs. Some blocks of pixels need to be coded more accurately than others for example, blocks with smooth intensity gradients need accurate coding to avoid visbile block boundaries. To deal with this inequality between blocks, the MPEG algorithm allows the amount of quantization to be modified for each macroblock of pixels. This mechanism can also be used to provide smooth adaptation to particular bit rate. Predicted Pictures

A P-picture is coded with reference to a previous image(reference image) which is an I or P Pictures. From the above figure, the highlighted block in target image(the image to be coded) is simalar to the reference image except it shifted to upper right. Since most of changes between target and reference image can be approximated as translation of small image regions. Therefore a key technique call motion compensation prediction is used. Motion compensation based prediction exploits the temporal redundancy. Due to frames are closely related, it is possible to accurately represent or "predict" the data of one frame based on the data of a reference image, provided the translation is estimated. The process of prediction helps in the reduction of bits by a huge amont. In P-Pictures, each 16x16 sized macroblock is predicted from a macroblock of a previously encoded I picture. Sinces, frames are snapshots in time of a moving object, the macroblocks in the two frames may not be cosited, i.e. correspond to the same spatial location. Hence, a search is conducted in the I frame to find the macroblock which closely matches the macroblock under consideration in the P-frame frame. The difference between the two macroblock is the prediction error. This error can be coded in the DCT domain. The DCT of the errr results in few high frequency coefficients, which after the quantisation process require a small number of bits for represenation. The quantisation matrices for the prediction error blocks are different from those used in intra block, due to the distinct nature of their frequency spectrum. The displacements in the horizaontal and vertical directions of the best match macroblock from the cosited macroblock are called motion vectors. Differential coding is used because it reduces the total bit requirement by transmitting the difference between the motion vectors of consecutinve frames. Finally it use therunlength encoding and huffman coding to encode the data. Biderectional Pictures example:

From the above pictures, there are some information which is not in the reference frame. Hence B picture is coded like P-pictures except the motion vectors can reference either the previous reference picture, the next picture, or both. The following is the machanism of B-picture coding. MPEG-2 in everyday life: Just about wherever you see video today. DBS (Direct Broadcast Satellite) The Hughes/USSB service will use MPEG-2 video and audio. Thomson has exclusive rights to manufacture the decoding boxes for the first 18 months of operation. No doubt Thomson 's STi-3500 MPEG-2 video decoder chip will be featured. Hughes/USSB DBS already begun service in North America in 1994. Two satellites at 101 degrees West share the power requirements of 120 Watts per 27 MHz transponder. Multi-source channel rate control methods is employed to optimally allocate bits between several programs on one data carrier. An average of 150 channels are planned. CATV (Cable Television) Despite conflicting options, the the cable industry has more or less settled on MPEG-2 video. Audio is less than settled. For example, General Instruments (the largest U.S. consumer cable set-top box manufacturer) have announced the planned use of the Dolby AC-3 audio algorithm. DigiCipher The General Instruments DigiCipher I video syntax is similar to MPEG-2 syntax but uses smaller macroblock predictions and no B-frames. The DigiCipher II specification includes modes to support both the GI and full MPEG-2 Video Main Profile syntax. Services such as HBO will upgrade to DigiCipher II in 1994. At the European IBC broadcast technology convention, in September 1994,GI demonstrated a prototype DCII encoder which handles both digital encoding standards. Fully configured the encoder will be able to process 16 analogue video inputs, plus 32 stereo audio channels and 32 data channels into a single high speed datastream which can be carried on cable, satellite, microwave or ATM systems. DCII technology has now been licensed to Scientific Atlanta and Hewlett Packard (both set-top manufacturers) and to chip manufacturers Motorola, LSI Logic and C-Cube. All these manufacturers already support MPEG2 and plan to incorporate DCII into dual mode digital video decoder chips for the set-top terminal market. HDTV

The U.S.Grand Alliance, a consortium of companies that formely competed for the U.S. terrestrial HDTVstandard, have already agreed to use the MPEG-2 Video and Systems syntax (including B-pictures). Both interlaced (1440 x 960 x 30 Hz) and progressive (1280 x 720 x 60 Hz) modes will be supported. The Alliance must then settle upon a modulation (QAM, VSB, OFDM), convolution (MS or Viterbi), and error correction (RSPC, RSFC) specification. In September 1993, the consortium of 85 European companies signed an agreement to fund a project known Digital Video Broadcasting (DVB) which will develop a standard for cable and terrestrial transmission by the end of 1994. The scheme will use MPEG-2. This consortium has put the final nail in the coffin of the D-MAC scheme for gradual migration towards an all-digital, HDTV consumer transmission standard. The only remaining analog or digital-analog hybrid system left in the world is NHK's MUS Conclusion: Mpeg-2 has been very successful in defining a specification to serve a range of applications, bit rates, qualities and services. Currently, the major interest is in the main profile at main level (MP@ML) for applications such as digital television broadcasting (terrestrial, satellite and cable), video-on-demand services and desktop video systems. Several manufacturers have announced MP@ML single-chip decoders and multichip encoders. Prototype equipment supporting the SNR and spatial profiles has also been constructed for use in broadcasting field trials. The specification only defines the bitstream syntax and decoding process. Generally, this means that any decoders which conform to the specification should produce near identical output pictures. However, decoders may differ in how they respond to errors introduced in the transmission channel. For example, an advanced decoder might attempt to conceal faults in the decoded picture if it detects errors in the bitstream. For a coder to conform to the specification, it only has to produce a valid bitstream. This condition alone has no bearing on the picture quality through the codec, and there is likely to be a variation in coding performance between different coder designs. For example, the coding performance may vary depending on the quality of the motion-vector measurement, the techniques for controlling the bit rate, the methods used to choose between the different prediction modes, the degree of picture preprocessing and the way in which the quantiser is adapted according to the picture content. The picture quality through an MPEG-2 codec depends on the complexity and predictability of the source pictures. Real-time coders and decoders have demonstrated generally good quality standard-definition pictures at bit rates around 6 Mbit/s. As experience of MPEG-2 coding increases, the same picture quality may be achievable at lower bit rates.

REFERENCES: [1] ISO/IEC 11172: 'Coding of moving pictures and associated audio for digital storage media at up to about 1.5 Mbit/s'. [2]ISO/IEC 13818: Generic coding of moving pictures and associated audio (MPEG-2). [3]Encoding parameters of digital television for studios, CCIR Recommendation 601-1 XVIth Plenary Assembly Dubrovnik 1986, Vol. XI, Part pp. 319-328. [4]JAIN, A.K.: Fundamentals of digital image processing (Prentice Hall, 1989). [5]WELLS, N.D.: Component codec standard for high-quality digital television, Electronics & Communication Engineering Journal, August 1992, 4, (4), pp. 195-202.