Image Quality Assessment-Oriented Frame Capture of Video Phone over 3G system

Image Quality Assessment-Oriented Frame Capture of Video Phone over 3G system Chao Feng, Lianfen Huang*, Jianan Lin and Jinjin Li School of Information Science and Technology, Xiamen University Xiamen, 361005, China E-mail: lfhuang@xmu.edu.cn Abstract Image quality assessment plays an important role to the 3G network QoE/QoS planners. A great deal of effort has been made in recent years to develop objective image quality metrics that correlate with perceived quality measurement. Unfortunately, there is a lack of non-simulation image material to optimize the algorithms and the parameter settings. In this paper, we suggest a method to capture the frame of video phone on Dtivy TM A2000+H platform. With these frames, the objective image quality assessment algorithms could be improved so that they are used more targeted. Keywords: Image Quality Assessment; TD-SCDMA; Video Phonet; LARENA TM I. INTRODUCTION With the developing of the 3G network, the image and video applications are being used more and more widely. How to provide high performance service to the customers is an important problem for the Service Provider. At present, the network planners depend on the human visual system (HVS) to take the subjective image quality assessment. Compared with the time consumptive and expensive subjective quality assessment, objective assessment may be more convenient and economical to be applied in mobile application. So it attracts great interest of researchers to study the objective image quality assessment. Because most of the objective algorithms can t do real time assessment, it requires capture and saving the measured video frame first and then assesses its quality. Currently, when the researchers develop the metrics of image quality assessment, they usually adopt software-simulation technique i.e. simulate some of the wireless communication channel in software and transmit a standard test image or video clip over this simulated system. At the receiver, the degraded image can be captured and saved and then the quality assessment metrics is used to assess the degraded images or video clips. This simulation method is helpful to develop the metrics at the start stage. But there are some differences between the actual image material and simulated image material after all. The simulation can t generate the images that containing all kinds of distortion pattern. Whether a developed quality assessment metric is fit for actual wireless network communication system, it must be validated with the image material that is captured through the actual network. So we need to capture the image through the actual communication system e.g. TD-SCDMA video phone. 978-1-4244-6734-1/10/$26.00 2010 IEEE II. COMMONLY USED OBJECTIVE QUALITY ASSESSMENT METRICS Objective image quality metrics can be classified according to the availability of the original image signal, which is considered to be distortion-free or perfect quality. Some objective quality metrics assume that the undistorted reference signal is fully available; this type of metric is called fullreference (FR) image quality assessment. In many practical video service applications, the reference images or video sequences are often not assessable. Therefore, it is desirable to develop measurement approaches that can evaluate image quality blindly. This type of metrics is call Blind or noreference (NR) image and video quality assessment. There exists a third type of image quality assessment method, in which the original image or video signal is not fully available. Instead, certain features are extracted from the original signal and transmitted to the quality assessment system as side information to help evaluate the quality of the distorted image. This is referred to as reduced-reference (RR) image quality assessment [1]. Currently, the most widely used FR objective image quality metrics are mean squared error (MSE) and peak signal-to-noise ratio (PSNR). Although the MSE and PSNR metrics are not very well matched to perceived visual quality of whole video sequence, because they are simple to calculate, have clear physical meanings, and are mathematically easy to deal with for optimization purposes. They are widely used. Another commonly used FR metrics is Structural Similarity (SSIM). It is a method for measuring the similarity between two images and it has proved to be consistent with HVS perception. In this paper, we design an approach that captures the frame of video phone on the DTivy TM A2000+H platform. And the reference image could also be acquired. After that the FR image quality assessment metrics could be used to evaluate the image quality of TD-SCDMA video phone. III. THE PLATFORM INTRODUCTION AND VIDEO FRAME CAPTURE In typical 3G video-calls, each customer transmits both voice and MPEG-4/H.263 video images in real time. At the receiver, the terminals decode the received MPEG-4/H.263 steam and the decoded frames are displayed on the LCD device. Therefore the HVS acquire the image information through the LCD of mobile phone. Various image distortions like blur, blocking, frame freeze and so on also can be displayed on the 359

LCD device. So during the video-call captures the LCD frame from frame buffer on the terminal platform is the best way to get the images that contain the distortion information. While at the sending terminal, we try to send a standard test image instead of the image data acquired via the camera to the video phone CS64k channel. After their transmission over this channel, looking for the LCD frame buffer on the receiving terminal platform and read the image frame data from it. The standard test image is prepared in advanced and it is considered to be the reference image. The captured frame is considered to be the distorted image. In this way, we could use the FR metric to evaluate the image quality of the video phone. A. DTivy TM A2000+H Platform The developing platform is DTivy TM A2000+H which are designed by the Leadcore Technology Co., Ltd. It is a totally solution for developing the TD-SCDMA mobile terminal. The hardware of the platform could be divided into there subsystem: Base Band (BB subsystem), Radio Frequency (RF subsystem) and Multimedia subsystem. Considering the 3G system provide a lot of multimedia applications, there is a coprocessor i.e. MV8650 in the multimedia subsystem. It mainly accomplishes the MPEG4/H.263 Codec, MP3 decoder, video/audio input and output so that the workload of the main processor could be minimized. It receives the camera s output and previews the received image by transferring the RGB data directly to the LCD s graphic memory according to the commands [2]. The diagram of the hardware is shown in Figure 1. Figure.1 The Diagram of the platform. (a) The sending terminal, (b) The receiving terminal and (c) image quality assessment The software tool kits of DTivy TM solution are called LARENA TM. The LARENA TM platform encapsulates the high layer protocol stack of TD-SCDMA network and affords application interface and device manager interface. Meantime it also provides a series of tools to support the software developing and debugging. There are two parts of LARENA TM : User Equipment (UE) part and PC part. UE part mainly consists of some mobile application software and the PC part mostly contains the simulation environment and other develop tools. The most important components of UE part are: Abstract Operation System (AOS), Graphical User Interface (GUI), Man Machine Interface (MMI) Framework, and Platform API. Some application services like Mobile Services, Local Application and so on have been implemented on the LARENA TM which are named Service Enable. The whole Service Enables provide a set of Application Program Interface (API). The Service Enable API, GUI API and the AOS API compose the platform API which makes the developers do the application development but don t need to know the detail of them. While the MMI framework provide an applications develop mechanism that can be used to add the applications into the end users conveniently. The UE part of LARENA TM [3]is described in Figure 2. (a) (b) (c) Figure.2 UE part of LARENA TM The video phone service has been implemented on the platform. But it doesn t have such an application that captures the video frame during the video-call. Next we introduce the method of capture video frame. B. Capture the Distorted image frame When we call a video phone, the app_vt_process Process starts to run. During its running, the function vt_onkey_options_wnd would be executed which some options that can be operated during the video phone. In other word, a window will pop up when the vt_onkey_wnd function is executing. There is an option menu in the window. The steps of adding the video frame capturing are as follows: 360

In the function vt_onkey_options_wnd, there is an array named ListExItem talk_opt_items[]. Add a character string define VT_TALK_OPT_PHOTO into it. After adding it, there is an extra option named capture in option menu. ListExItem talk_opt_items[] = APP_CAM_STR_PHOTO,N_(""),NULL, VT_TALK_OPT_PHOTO, 0,NULL,0}, }; Set the function that links to the capture : in the function vt_talk_opt_ret add another case vt_onkey_camera(). static SINT32 vt_talk_opt_ret (HWND hwnd, WPARAM wparam, LPARAM lparam) SINT32 ret = 0; switch(wparam) case VT_TALK_OPT_PHOTO: vt_onkey_camera(talk_hwnd, -1); break; } Program the function vt_onkey_camera: first call the API function vt_get_next_filename (fn). This function generates a data file. Then call the API function tp_call_vp_cap_remote_pic (fn) which capture the current frame of the video phone and store the data into the file. There are raw data frame buffer defined as rawbuffer and JPEG data buffer defined as jpegbuffer. Modify tp_call_vp_cap_remote_pic(fn) so that the RGB565 raw data or JPEG data can be read from the rawbuffer or jpegbuffer. As there is a great disparity in frame data size between JPEG frame and RGB565 frame. We must allot the memory for the frame data according to the data length and judge if the data in the buffer end would be read. The key routine are as following: FILE *stream; stream = tp_fopen(fn, w ); if (length!= tp_fwrite((uint16 *)buffer, 1, encodemp4.length, stream)) PRINT( \n tp_fwrite error ); } There is a structure define---encodemp4 in which the data length is defined. If we want to read the raw frame data, because the data is the LCD display data, so the data format is RGB565. The size in one frame is 176*144*2=50688 Bytes(the image resolution of video phone is QCIF )[4] After adding the new option capture, the current frame of the video phone can be captured and the frame data would be stored into the existed file when we click the capture menu. If the data format is RGB565, the validation routine should be programmed: RGB565 frame i.e. the RED part is stored in 5 bit, GREEN part is stored in 6 bit and BLUE part is stored in 5 bit. Totally a RGB565 frame is stored by 16 bit i.e. 2 bytes in memory space. Read 2 bytes data in turn from the data file every cycle and do the logic and operation with 0xF800, 0x07E0, 0x001F separately to get the Red, Green, Blue part of the frame [5]. Figure 3 shows the RGB565 verification result. This is the received distortion image for image quality assessment system. Figure.3 RGB565 frame C. Get the reference image The FR metrics assume that the undistorted reference image is fully available. During the video-call, when the camera is closed, a prepared standard test image replaces the camera data so that the video phone can be last. And the LCD at the receiver displays the test image rather than acquired image data by the camera at the sending. Click the option menu close the camera, the function vt_onkey_video_switch() would be called and run. The video source of video phone can be alternative with this function. If the function returns -1, the image data from camera will not be sent, instead sending the replaced image that was set in advance. The API function tp_vp_repl_img_get() is being called in vt_onkey_video_switch(). The prepared test image would be used as the reference image [6]. IV. ASSESS THE IMAGE QUALITY Using the above method, we could obtain the reference image and distorted image after the stage of source encoding and channel transferring. The FR metrics PSNR and SSIM [7] can be used to assess the image quality in every stage during the video phone. The images are shown in Figure 4 and the evaluation results are listed in Table. 361

(a) (b) (c) Figure.4 The images in different stage. (a) Reference image (b) JPEG image and (c) the image after transfer over the wireless channel Table. the Result of Image quality assessment V. CONCLUSIONS In this paper, a method used to capture the frame of video phone is introduced. With this method, we could afford the image data from actual communication system to image quality assessment metrics. The metrics developers can make use of these image data to monitor the quality of the communication system. It is also helpful to modify the assessment algorithms so that the evaluation systems have batter performance. VI. REFERENCES [1] Zhou Wang, Hamid R. Sheikh and Alan C. Bovik, The Handbook of Video Databases: Design and Applicaitons B. Furht and O. Marqure, ed., CRC Press, pp. 1041-1078, September 2003 [2] Mtekvision Co., Ltd, MV8650 DataSheet Version 1.2, pp.9-11, February 2008. [3] Wen Cheng, Implementation of a Streaming Protocol Stack Based on ARENA Platorm, pp. 10-12, May 2007 [4] Leadcore Technology Co.,Ltd, LARENA TM Platform Video Phone Program Guide April 2009 [5] MATLAB Central, http://www.mathworks.com/matlabcentral/fileexchange/ [6] Leadcore Technology Co.,Ltd, LARENA TM Platform Video Phone API Reference Guide April 2009 [7] Zhou Wang, Al Bovik, Modern Image Quality Assessment Morgan & Claypool Publishers, February 2006 362