Novel Development of Video Coding using SVC Concepts in IP Scenario Rahimunisa Nagma, Dr. TC Manjunath, Pavithra G. MTech, Department of ECE, HKBKCE Nagawara, Bangalore, Karnataka-560045, India Abstract Scalable video coding is a very useful option for video service providers because it has the ability to adapt a video s bit stream at the server so as to suit various network conditions and also to suit various device characteristics. For the compression to happen, lowering of video s bit rate has to be done. This can be achieved by reducing frame rate, spatial resolution, and/or by increasing the quantization levels that is applied to the video sequence under consideration. In this paper we evaluate the effects of scalability using no-reference or reduced-reference video quality metrics, namely PSNR, SSIM, blocking and blurring. In this paper we provide comparison between various video coding standards and signify the advantage of SVC over prior video coding standards. Keywords H.264 SVC, PSNR, SSIM, Blocking, Blurring. I. INTRODUCTION With the introduction of various standards for video coding demonstration of video compression capability have been done regarding significant improvements achieved in the same field. The Scalable video Coning(SVC) has also been standardized by the Joint Video Team of the ISO/IEC MPEG and the ITU-T VCEG. SVC is the extension of the already available video standard H.264/AVC. SVC enables the transmission of partial bit streams and decoding them, that results in providing of video services with reduced temporal or spatial resolutions, at the same time reducing the reconstruction quality that is high as compared to the partial bit streams rate. Hence SVC provides adaptation of power, bit rate and format and also provides graceful degradation in environments with lossy transmission, Relative to the scalable profiles of previous video coding standards, there are significant improvements in coding efficiency achieved by SVC along with an improved degree of supported scalability. II. OBJECTIVE The main objective of the proposed dissertation work is to realize the traffic aware video coding using Scalable Video Coding. The dissertation aims in achieving scalability of the video stream in three different dimensions namely Spatial, Temporal and Quantization through degradation by considering varying internet speeds. After achieving scalability, quantifying of the effect of network transmissions and scalability options at the end user side who is the final receiver of the transmitted video is done. The results obtained can be used to select appropriate scalability options to satisfy the requirements of end-users and to satisfy the quality constraints or bandwidth. III. PROPOSED WORK In the proposed work we use a reduced-reference method instead of a full-reference method in order to get the quantitative measure of video quality at the end user side. In the proposed work, we are aiming in achieving scalability of the video stream in three different dimensions namely Spatial, Temporal and Quantization through degradation by considering varying internet speeds. Here we provide a feedback to the network service provider about the loss perceptually introduced by the wireless transmission network. By doing this, the assessment strategy can be simplified to a greater extent. In our proposed work we see how scalability is achieved by giving the basic concepts of extending previous video coding standard H.264/Advanced Video Coding(AVC) towards Scalable video coding(svc) standard. We show the block diagram of proposed work to show traffic aware video coding is happening using Scalable Video Coding. Also an Encoder and Decoder block diagrams are shown separately with their detailed explanation. Scalability is the ability of a system, network or process to handle a growing amount of work in a capable manner or its ability to be enlarged to accommodate that growth. A video bit stream is called scalable if some parts of the stream can be removed such that the sub stream obtained, forms another valid bit stream and represents the content of the source with lower reconstruction quality than the original video bit stream. The most common scalability modes are: Temporal, Spatial and Quality scalability. 1) Temporal scalability: Here the subsets of video bit streams have reduced frame rate (frame resolution). 2) Spatial scalability: Here the subset of video bit streams have reduced picture size (spatial resolution). 3) Quality scalability: It is commonly referred to as signal-to-noise-ratio(snr) or fidelity scalability. Figure 1: Types of Scalabilities ISSN: 2231-5381 http://www.ijettjournal.org Page 405
A. Encoder part Figure 4: General H.264 Encoder block diagram Figure 2: Proposed Block Diagram The video sequence is first converted into YUV sequence where YUV model defines a color space in terms of 1 luma(y) and 2 chrominance(uv) components. Encoding of raw YUV sequence is done with the help of H.264 SVC encoder. This encoder encodes raw YUV sequence into 3 different bitstreams-temporal, spatial or quantization that are scaled down to one dimension. Since SVC has the ability to adjust itself to suit different network conditions and device characteristics, the encoded scalable video s bit stream is subjected to network-based video scaling. Then the bitstream is transmitted and fed to H.264 SVC decoder where the decoded sequence is obtained called the extracted YUV sequence. The extracted YUV sequence is compared with the raw original sequence and the amount of degradation in spatial, temporal or quantization dimensions is made(assessed) on video quality. Full-reference and reduced-reference video quality measurement is done. Full-reference measurement requires both original and extracted sequence whereas reduced-reference requires mostly only extracted YUV sequence for video quality measurement. Figure 3: Basic Block diagram of Encoder and Decoder In the basic block diagram of an encoder and decoder, the source video is taken from a PC-display and is encoded using H.264 SVC encoder to obtain a video binary bit stream. This bit stream is transmitted via the HTTP network-which is designed to enable client and server communications. From the HTTP network, the bit stream is fed to the H.264 SVC decoder. Decoder decodes the binary bit stream to give extracted video sequence as output which is fed to the receiver side PC display. The source video is taken and prediction of video is carried out. H.264 SVC involves intra-prediction and inter-prediction. In Intra-prediction prediction, prediction of macroblock is done by referring to only current slice and not referring any outside data. For luma component there are three choices of intra prediction block size i.e., 16x16, 8x8 or 4x4 and foe chroma component there is a single prediction block. Example of a 4x4 block to be predicted is shown below: Figure 5: 4 4 luma block to be predicted Inter prediction is done referring to previously coded frames using motion compensated prediction. Inter prediction involves prediction region selection, prediction block generation and then subtraction of this from original block of samples to give a residual which is then coded and transmitted. A motion vector is the offset between position of the current partition and the prediction region in the reference picture.the motion vector is differentially coded one by one from neighbouring block s motion vectors. Figure below shows macroblock and sub-macroblock partition: Figure 6: Macroblock partitions and sub-macroblock partitions ISSN: 2231-5381 http://www.ijettjournal.org Page 406
B. Transform and Quantization based on previous coding statistics, the probability models are updated. The transform and quantization in H.264/SVC are designed to minimize computational complexity, to avoid mismatch of the encoder-decoder and to be suitable for implementation using limited-precision integer arithmetic. This can be achieved by: Using an integer transform, a core transform which is carried out using integer or fixed-point arithmetic and With quantization process of integration and normalization step, to minimize the number of multiplications required to process a block of residual data. In order that every H.264 implementation produce identical results by eliminating mismatch between different transform implementations, scaling and inverse transform processes are carried out by a decoder. C. Entropy Encode Prior to entropy coding, the blocks of transform coefficients are converted into a linear array. The intention of scan order is to group together the non-zero quantized coefficients called significant co-efficients. For a typical progressive frame s block, the non-zero co-efficients tend to be grouped or clustered around top-left DC co-efficient. A zigzag scan order is most efficient in this case. Example for a progressive scan order for 4x4 blocks is shown below: H.264 SVC uses CABAC entropy coding mode. Following steps are performed while coding: Encoding of Binary decision: First all the binary decisions(1 or 0) are encoded. A non-binary valued symbol is binarized prior to arithmetic coding. The following steps are repeated for each bit or bin of binarized symbol: 1) Selection of control model: It is done for one or more bins by the selecting from available models depending on the statistics of recently-coded data symbols. It is a probability model which stores the probability of each bin as 1 or 0. 2) Arithmetic Encoding: There are only 2 sub-ranges for each bin, corresponding to values 0 and 1. 3) Updating of selected context model: It is done based on the actual coded value. Example: If bin value was 0, the frequency count of 0 is increased. As a result we obtain a compressed H.264 data which is then transmitted (or stored) via a wireless http network to the H.264 SVC decoder. D. Decoder part The compressed H.264 bitstream is given to the H.264 SVC video decoder which extracts the information such as quantized transform co-efficients, prediction information, etc., by decoding each of the syntax. Later this information is used to reverse the coding process and recreate video sequence. Figure 8: General H.264 Decoder block diagram Figure 7: Zigzag coding pattern A H.264 stream consists of a series of codec symbols. There are several methods for coding these symbols. Some of them are: 1) Fixed length code: Here the symbols are converted into a binary code with specified length. 2) CAVLC (Context-Adaptive Variable Length Coding): Here by using context adaptation, different sets of variable-length codes are chosen depending on the statistics of recently-coded co-efficients. 3) CABAC (Context-Adaptive Binary Arithmetic Coding): It is a method of arithmetic coding wherein, IV. PROJECT IMPLEMENTATION AND SIMULATION REPORT The dissertation work have started with the implementation of a H.264 SVC encoder part and H.264 SVC decoder part. Subjective and Objective measurement of video quality is carried out. 1) Subjective Measurement of video quality: It is based on or influenced by personal opinions. It does not give a quantitative measure of the quality but defines the quality in terms of words such as good, better, best, etc. There are various factors that influence subjective quality such as the Human Visual System(HVS), the eye and the brain. A viewers opinion about video quality is also affected by other factors such as his state of mind, his viewing environment, ans his visual attention. ISSN: 2231-5381 http://www.ijettjournal.org Page 407
2) Objective measure of video quality: This gives a quantitative measure of video quality with a little complexity and cost as compared to subjective measurement. In this we have Full-reference evaluation and Reduced-reference evaluation of video. Example of Full-reference evaluation methods are: 1) PSNR: It is the ratio of useful energy to the error energy. PSNR db=10log10 (2^n-1)^2 eqn (1) MSE Taking n=8, PSNR db=10log10 (255)^2.. eqn (2) MSE It is simple to calculate and requires very less time.widely used to compare compressed and decompressed video image quality. 2) SSIM: It is based on measuring the three components like luminance similarity, contrast similarity and structural similarity and then combining these to give the result. SSIM(i)=((2 mx my)+c1) ((2 covxy)+c2)..eqn(3) (((mx)^2+c1) varx+vary+c2) Examples of reduced reference evaluation are: 1) Blocking: These are the square or rectangular shaped distortion areas in an image. This kind of distortion is likely to be seen at boundary between blocks that contain coded co-efficients or boundary of an intra code macro block. Block distortion is likely to be more significant when quantization parameter (QP) is higher. 2) Blurring: This is one of the degradation parameter. Blurring increases with increase in compression, as there is a reduction in contrast between neighboring pixels. V. SIMULATION RESULTS Figure 9: The simulation waveforms of the metrics. ISSN: 2231-5381 http://www.ijettjournal.org Page 408
10.0346 38.6449 15.0018 63.1510 TABLE II COMPARISON TABLE FOR RHINOS VIDEO CLIP SNR MSE PSNR -20.4794 7.3523-15.4640 7.6792-10.4365 8.2899-5.3897 9.4290 0.3078 11.7833 5.1863 16.8024 10.0574 28.5847 15.0023 61.1243 Figure 10: Comparison waveform of SNR versus MSE. TABLE III COMPARISON TABLE FOR PETS VIDEO CLIP SNR MSE PSNR -20.4796 7.3479-15.4645 7.6687-10.4355 8.3121-5.3890 9.4415 0.3068 11.8617 5.1867 16.7808 10.0568 28.6786 15.0024 60.1923 VI. CONCLUSION AND FUTURE SCOPE Figure 11: Comparison waveform of SNR versus PSNR. The above snapshots in Figure 9 shows the simulation waveforms of the four metrics plotted against the number of frames at a condition when noise is present at the decoder side. Noise is manually added to produce disturbance at the decoder side since we are not showing the transmission process. The four metrics are PSNR, SSIM, blocking and blurring.the Figure 10 shows the behavior of Mean Squared Error(MSE) when SNR(Signal to Noise Ratio) is varied and Figure 11 shows the behavior of Peak Signal to Noise Ratio(PSNR) when SNR is varied. From the comparison waveforms, tables for different types of video clips (screw video clip, rhinos video clip and pets video clip) is shown below to know how the SNR value, MSE value and the PSNR value will actually vary. TABLE I COMPARISON TABLE FOR SCREW VIDEO CLIP SNR MSE PSNR -20.3050 11.8783-15.2916 12.3246-10.2761 12.8705-5.2407 14.2440 0.1910 16.5544 5.1163 21.5200 The H.264 SVC encoder and decoder parts are implemented. The two no-reference metrics blocking and blurring are used to find out the effect on video quality as we progress through degradation path for every scalable dimension. Also the effect on video quality in the presence of loss is investigated for each scalable dimension. Our findings indicate that-as spatial resolution decreases and the quantization decreases, the impact of loss on video quality is decreased. Also the impact of loss in temporal degradation leads to a greater impact on video quality. This work can be used as reference for selection of suitable dimensions to maximize the video quality when constructing the SVC sequence layers and also to save the bandwidth to its largest amount. ACKNOWLEDGMENT We wish to acknowledge HKBK College of Engineering for providing the Infrastructure to carry out the process of developing a soft core for Traffic aware video coding using Scalable Video Coding(SVC). ISSN: 2231-5381 http://www.ijettjournal.org Page 409
REFERENCES [1] Patrick McDonagh, Amit Pande, Member, IEEE, Liam Murphy, Member, IEEE, and Prasant Mohapatra, Fellow, Towards Deployable Methods for Assessment of Quality for Scalable IPTV Services, IEEE transactions on Broadcasting, vol 59, No.2, June 2013. [2] H.Schwarz, D.Marpe, and T.Wiegand, Overview of the scalable video coding extension of the H.264/AVC standard, [3] IEEE Transactions on Circuits and Systems for Video Technology, vol.17, no.9, pp.1103-1120, 2007. [4] Kassler, M.O Droma, M.Rupp, and Y. Koucheryavy, Advances in quality and performance assessment for future wireless communication services, Eurasip Journal on Wireless communication and networking, vol.2010, Article ID 389728, 2010. [5] R.Haddad, M. McGarry, and P.Seeling, Video bandwidth forecasting, IEEE Communications Survey & Tutorials, vol. 15, no.4, pp.1803-1818, 2013. [6] J. Ostermann, J. Bormans, P. List, D. Marpe, M. Narroschke, F. Pereira, T. Stockhammer, and T. Wedi, Video coding with H.264/AVC:Tools, performance, and complexity, IEEE Circuits Sust. Mag., vol.4, no.1, pp. 7-28, Jan-Mar.2004. [7] G-M. Muntean, P. Perry, and L. Murphy, Objective and subjective evaluation of QOAS video streaming over broadband networks, IEEE Trasn. Network service Manage., vol.2, no.1, pp.19-28, Nov.2005. [8] L. Zhang, C. Yuan, and Y. Zhong, Reliable and efficient adaptive streaming mechanism for multi-user SVC VoD system over GPRS/EDGE network, in Proc. IEEE Int. Conf. Comput. Sci. Softw. Eng., vol.3.2008, pp.232-235. [9] M.Ghareeb, A. Ksentini, and C. Viho, Scalable video coding(svc) for multipath video streaming over video distribution networks(vdn), in Proc. IEEE Int. Conf. Inform. Networking,2011,pp.206-211 [10] Ksentni, M.Naimi, and A. Gu eroui, Toward an improvement of H.264 video transmission over IEEE 802.11e through a cross-layer architecture, IEEE Commun. Mag., vol.44,no.1,pp.107-114, Jan.2006. [11] Singh, A. Ksentini, and B. Marienval, Quality of experience measurement tool for SVC video coding, in Proc. IEEE ICC, Jun.2011, pp.1-5. [12] Monteiro, C, Calafate, and M. Nunes, Evaluation of the H.264 scalable video coding in errer prone IP networks, IEEE Trans. Broadcast., vol.54, no.3, pp.652-659, Sep. 2008. [13] Iain E. Richardson, The H.264 Advanced Video Compression Standard, 2 nd ed, 2010. ISSN: 2231-5381 http://www.ijettjournal.org Page 410