The RTP Encapsulation based on Frame Type Method for AVS Video

Applied Mechanics and Materials Online: 2012-12-27 ISSN: 1662-7482, Vols. 263-266, pp 1803-1808 doi:10.4028/www.scientific.net/amm.263-266.1803 2013 Trans Tech Publications, Switzerland The RTP Encapsulation based on Frame Type Method for AVS Video Weiqiang Wu a, Lei Wang b, Qinyu Zhang c and Changjian Zhang d Harbin Institute of Technology Shenzhen Graduate School, Shenzhen, Guangdong, 518055, China a wwq520516@163.com, b fengqiyunran@163.com, c zqy@hit.edu.cn, d christopherzcj@gmail.com Keywords: AVS, RTP, Encapsulation, Frame type. Abstract. According to the characteristics of AVS video data, a RTP encapsulation method based on frame type is proposed. When the video data is encapsulated by the RTP protocol, different types of video data such as sequence header, sequence end, I frame, P frame and B frame are encapsulated with different method. Under the limit of maximum transmission unit (MTU), sequence headers, sequence ends and I frames are encapsulated individually to reduce the packet length and protect the important data. While multiple P frames and B frames are encapsulated into one RTP packet to reduce the quantity of the RTP packets and decrease the overload of link. Simulation results show that, compared to the frame-based encapsulation method, the proposed method can reduce the packet of the video data effectively and improve the quality of video service. Introduction AVS (Audio Video coding Standard) [1] is the digital audio and video coding standard, which is developed by digital audio and video codec technology standards working group. It is China's digital audio and video codec technology standards with independent intellectual property rights. The efficiency of AVS, which achieves the international advanced level [2], is equivalent with MPEG-4 AVC s [3]. The applications based on AVS encoding audio and video streams are increasing such as network video conferencing, video surveillance, and even high-definition television. The related technology has broad application prospects and huge market value. AVS audio and video streaming uses RTP [4] (Real-time Transport Protocol) to guarantee the real-time transmission [5]. Data encapsulation method has an important influence on the transmission performance. In this paper, a RTP encapsulation method based on frame type is proposed. It encapsulates AVS data with different encapsulation strategy according to the AVS frame types to improve the transmission performance. The Exiting Encapsulation Method AVS video stream can be divided into video sequences, picture frames, strips, macroblocks and blocks. AVS video stream structure is shown in Fig. 1 [6]. The video sequence is mainly composed of picture frames, and each picture frame is constituted by one or multiple strips, and each strip is composed of a series of macroblocks. The RTP encapsulation is to encapsulate the above data units into RTP packets. The main current encapsulation methods are as follow. (1) The hard packet method The hard packet method is the most fundamental and simple encapsulation method. RTP packets are encapsulated with fixed-length data. Usually, the fixed length is less than the required maximum length of the MTU (Maximum Transmission Unit) in order to avoid the data striping in lower layer. But this method cannot guarantee that the frame header and the corresponding first macroblock is encapsulated in the same RTP packet. Once the frame header loses, the receiver would not be able to decode the data. All rights reserved. No part of contents of this paper may be reproduced or transmitted in any form or by any means without the written permission of Trans Tech Publications, www.ttp.net. (ID: 130.203.136.75, Pennsylvania State University, University Park, USA-09/05/16,21:56:23)

1804 Information Technology Applications in Industry Fig. 1. AVS video stream structure (2) Frame-based encapsulation method This method was recommended by RFC3016 [7]. It has been widely used in the actual transmission of MPEG-4 stream. With this method, the basic encapsulation unit is the frame in AVS standard. It means that each frame is encapsulated into one single RTP packet no matter how long the frame is. This method is simple but it does not consider the length of the video frame data. If the video frame is longer than the MTU, data striping in lower layer will make it more difficult to replay video in the receiver. While if the video frame is short, this method will increase the load and cost of the network because of the overhead of the packet headers [8]. (3) Macroblock-based encapsulation method This encapsulation method treats macroblock as the smallest encapsulated granularity. The macroblocks of video frames are encapsulated into RTP packet which is not longer than MTU. This method avoids the data striping in the lower layer. And because each RTP packet is filled with macroblocks, the overhead of packet headers is decreased signally. But this method is relatively complex. And one short video frame may be encapsulated into two different RTP packets, which increases the difficulty of data reception. In AVS standard, the different types of video frames with different length have different characteristics. While the existing encapsulation methods encapsulate all video frames with the same processing method and do not consider the characteristics of the AVS video data. Hence in this paper, a RTP encapsulation method based on the frame type for AVS video is proposed. RTP Encapsulation Method Based on the Frame Type for AVS Video AVS video frame types and characteristics. AVS-P2 [9] video sequence is composed of sequence header, a string of coded image data and sequence end. There are three encoded pictures: intra coded picture (I frame), forward inter frame encoded picture (P frame) and bidirectional inter frame encoded picture (B frame). The decoding of the I frame doesn t depend on any other frames, the decoding of the P frame depends on its previous I frame or P frame, and the decoding of the B frame depends on its front recent an I frame or P frame and its behind recent a P frame. Usually, an AVS video sequence includes an I frame, a series of P frames and B frames, where the amount of B frames is the largest.

Applied Mechanics and Materials Vols. 263-266 1805 For the decoding of the receiver, I frame is the most important. I frame is of the maximum amount of data, the longest length. P frame is shorter and B frame is the shortest. In the video transmission process, if I frame is lost or late in a sequence, the video sequence cannot be decoded, even if the P, B frame are received opportunely. Therefore, it s important to protect the key video data such as I frame during RTP encapsulation. According to the different importance, different type of video frames should be encapsulated with different methods. RTP encapsulation method based on the frame type. Considering AVS video data characteristics, an RTP encapsulation method based on frame type is proposed. When the video data is encapsulated into RTP packets, the sequence headers, sequence ends, I frames, P frames and B frames are encapsulated with different method to decrease the load of link and protect the key data. (1) Encapsulation of the sequence headers and sequence ends Generally, sequence headers and sequence ends are short but important to the video sequence. Therefore they are encapsulated into a RTP packet individually. (2) Encapsulation of I frames One I frame is encapsulated individually into a RTP packet if it is shorter than the RTP packet. Otherwise, the I frame is encapsulated into multiple RTP packets as even as possible. (3) Encapsulation of P frames and B frames Because P frames and B frames is shorter, they are encapsulated into one RTP packet as much as possible. The encapsulation process in detail is as follow. Step 1: Receive the video data from the encoder and determine the type of received video data. If it is video sequence header or sequence end, go to step 2. If it is coded picture data, turn to step 3. Step 2: Video sequence header or sequence end is converted to a NALU (Network Abstract Layer Unit). The NALU is encapsulated into one RTP packet and transmitted, and then return to step 1 to receive the next data. Step 3: Get the type of data frame, determine the frame type and parse out the frame length L. If it is I frame, go to step 4. If it is P frame, go to step 5. If it is B frame, go to step 7. Step 4: Compare L and J (J denotes the maximum data length that can be encapsulated into the RTP packet under the limit of MTU). If L J, the frame is converted to one NALU, encapsulated into one RTP packet and sent. The encapsulated packet is single NALU. Its RTP payload format shown in Fig. 2; Fig. 2. Single NALU If L > J, according to inequality J(n-1) L <J(n), we can deduce an integer value n. The frame is converted to NALU and encapsulated into RTP packet with macroblock as even as possible. And then send the RTP packet. It ensures that the header of the frame and the first macroblock is encapsulated in the same RTP packet. After step 4, the process turns to step 1 to encapsulate and sent data. Step 5: Compare L and J. If L J, the frame is converted to NALU and encapsulated into RTP packets, and then go to step 6. If L > J, frame header and complete slices are converted to NALU as much as possible with slice as the smallest granularity, and then they are encapsulated into RTP packets and sent. Then turn to step 10.

1806 Information Technology Applications in Industry Step 6: Continue encapsulating the next video frame into the RTP packet which is partially filled. If the next video frame is I frame, do not encapsulate it. Send the RTP packet and go to step 4; If the next video frame is P frame or B frame, compare Lp and H + s1, where Lp is the length of data which can be encapsulated into the RTP packet, H is the length of frame header and s1 is the length of the first slice in this video frame. If Lp < H + s1, send the RTP packets and return to step 3. If Lp H + s1, compare Lp with L, where L denotes frame length. If Lp < L, frame header and the slices are converted to one NALU and encapsulated into RTP packet as much as possible with slice as the minimum granularity. Send the RTP packet. In this case, the NALUs that encapsulated into RTP packet have the different time stamps. The RTP packet is called multi-time aggregation Packet. For example, the RTP payload format of two-time aggregation packet is shown in Fig. 3. Then return to step 10; Fig. 3. Multi-Time Aggregation Packet Step 7: Compare frame length L with J. If L J, the frame is converted to NALU and encapsulated into RTP packet, and go to Step 8; If L > J, frame header and macroblocks are converted to a NALU and encapsulated into RTP packet as much as possible with macroblock as the minimum granularity. Send the RTP packet and go to Step 11. When Step 7 is finished, some macroblocks of one frame may not be sent. So it cannot be processed just like Step 3. In Step 10, for the same reason, that is the remaining slices have not been sent in a complete video frame, it is necessary to continue process. Step 8: Prepare to encapsulate the next frame. If it is I frame, the RTP packet is sent, and return to step 1 to conduct the new round of processing. If it is P frame, go to corresponding processing of Step 6. If it is B frame, go to step 9. Step 9: Compare Lb and H + b1 (b1 is the first macroblock data length of this video frame), where Lb is the length of data which can be encapsulated into the RTP packet. If Lb < H + b1, send the RTP packet, and return to Step 1. If Lb H + b1, compare Lb with the entire frame length L. If Lb L, the frame is converted to a NALU and encapsulated into RTP packet. Go to Step 8. If Lb < L, the frame header and macroblocks are converted to a NALU and encapsulated into RTP packet as much as possible with macroblock as the minimum granularity, Send the RTP packet and then go to Step 7. Step 10: Compare Ls with J, where Ls denotes the length of the remaining slices. If Ls J, the remaining slices are encapsulated into RTP packets. And then calculate remaining refill data length of the RTP packet. Go to step 6;

Applied Mechanics and Materials Vols. 263-266 1807 If Ls > J, compare s1 with J, s1 denotes the data length of the first slice in the video frame. If s1 > J, macroblocks are converted to a NALU and encapsulated into RTP packet with macroblock as minimum granularity. Send the RTP packet and then go to Step 11. If s1 J, slices are converted to a NALU and encapsulated into RTP packet as much as possible. Send RTP packet and then go to Step 10. Step 11: Get the data length of the remaining macroblocks. If the data length of the remaining macroblocks is less than or equal to J, the remaining macroblocks are converted to NALU and encapsulated into RTP packet, and go to Step 8; If the data length of the remaining macroblock is greater than J, the macroblocks are converted to a NALU as much as possible and encapsulated into RTP packets. Send the RTP packet and then repeat Step 11. Simulation and Performance Analysis We conduct the simulation with open source library JRTPLIB [10]. The video data of AVS were transmitted under the link rates of 512Kbps, 1.5Mbps and 3Mbps in WLAN. The video data amount is 18.5Mbytes. The frame rate is 25fps, and image resolution is 352*288 (CIF format). The resolution of RTP packet timestamp is 90KHz. Those RTP packets are of the same timestamp if a video frame is carried by more than one RTP packets. The simulation results of average packet are shown in Table 1-Table 3. Table 1. Total packet and key packet when link rate is 512 Kbps nth test 1 2 3 4 5 average packet Based-frame total packet Frame type total packet Based-frame critical packet Frame type critical packet 0.026 0.018 0.024 0.025 0.019 0.0224 0.023 0.018 0.020 0.022 0.017 0.0200 0.019 0.016 0.018 0.010 0.017 0.0160 0.006 0.008 0.009 0.008 0.010 0.0082 Table 2. Total packet and key packet when link rate is 1.5Mbps nth test 1 2 3 4 5 average packet Based-frame total packet Frame type total packet Based-frame critical packet Frame type critical packet 0.015 0.013 0.014 0.013 0.016 0.0142 0.008 0.013 0.012 0.012 0.009 0.0108 0.009 0.013 0.012 0.009 0.010 0.0106 0.001 0.001 0.000 0.000 0.001 0.0006 Table 3. Total packet and key packet when link rate is 3Mbps nth test 1 2 3 4 5 average packet Based-frame total packet Frame type total packet Based-frame critical packet Frame type critical packet 0.013 0.012 0.014 0.009 0.010 0.0116 0.010 0.013 0.011 0.007 0.009 0.0100 0.008 0.011 0.009 0.009 0.010 0.0094 0.000 0.000 0.001 0.000 0.001 0.0004

1808 Information Technology Applications in Industry From the three groups of experimental data, it is shown that total packet and key packet of the proposed RTP encapsulation method are lower than that of the typical frame-based encapsulation method. And the difference of key packet performance is more significant. Encapsulation method based on the frame type encapsulates the shorter video data, such as P frame and B frame, into a RTP packet. With the same amount of video data, it significantly reduced the amount of the RTP packets which are needed to be transmitted. It reduces the link burden and congestion. At the same time, the most RTP packets filled with key data such as sequences headers, sequences ends and I frames are shorter than those of frame-based method. So those key data is transmitted with smaller affect by wireless channel environment. The RTP encapsulation method based on frame type can protect key data better and improve the quality of service. Conclusion This paper proposed the RTP encapsulation method based on frame type. According to the AVS video data characteristics, we conduct the different encapsulation method to the key data and the data which is not very important. To reduce the packet length of such key data, sequence headers, sequence ends and I frames are individually encapsulated. To reduce the total amount of the RTP packets and the burden on the link, many P frames and B frames are encapsulated into one RTP packet. This method is efficient to reduce the packet of the video data of the key data. And it is helpful to improve transmission quality. It can be used for audio and video transmission system based on AVS encoding to enhance the user experience and satisfaction. Acknowledgements This work was financially supported by the national science and technology major projects of China (2010ZX03004-003-02). References [1] ISO/IEC 13818-1, Generic of moving pictures and associated audio: System. [2] Information technology of Coding of audio and visual object Part10: Advanced video coding. [3] Lu Yu, Feng Yi, Jie Dong, Cixun Zhang. Overview of AVS-Video: Tools, performance and complexity. In Proc. SPIE, Visual Communications and Image Processing, Beijing, China, Jul. 2005, pp. 679-690 ISO/IEC 14496-10. [4] IETF/STD 0064, RFC3550. RTP: a transport protocol for real-time applications [S]. [5] RFC 1889. A Transport Protocol for Real-Time Applications. 1996. [6] Information technology Advanced Coding of audio and video Part 2: Video, AVS-P2 Standard draft, 2005. [7] RFC 3016. RTP Payload Format for MPEG-4 Audio/Visual Streams [S]. 2000. [8] RFC 3640. RTP Payload For mat for Trans port of MPEG-4 Elementary Steams[S].2003. [9] ISO/IEC 14496-2. Information technology of Coding of audio and visual object Part2: Visual. [10] Information on http://research.edm.uhasselt.be/~jori/page/index.php?n=cs. Jrtplib.

Information Technology Applications in Industry 10.4028/www.scientific.net/AMM.263-266 The RTP Encapsulation Based on Frame Type Method for AVS Video 10.4028/www.scientific.net/AMM.263-266.1803