No-reference perceptual quality metric for H.264/AVC encoded video Tomás Brandão Maria Paula Queluz IT ISCTE IT IST VPQM 2010, Scottsdale, USA, January 2010
Outline 1. Motivation and proposed work 2. Technical description 3. Subjective quality assessment 4. Results 5. Conclusions
1 Motivation & Proposed work
No-reference metrics why use them? Image communication services Video streaming Mobile TV IP TV New applications Scalable billing schemes Users pay according to the quality they get QoE oriented network resource optimization When adjusting network parameters, perceived quality should also be considered No information about the original is available at the receiver need for no-reference image quality assessment
Proposed approach If the original media was available Original media Perceptual model Received media Local error measurement Local error weighting Quality score PSNR
Proposed approach Since the original media is not available Original media Perceptual model Received media Local error estimation Local error weighting Quality score PSNR Motion vectors Quantized coefficients Quantization steps
2 Technical description
Error estimation For quantization noise, the local squared error can be estimated as DCT coefficient distribution (PDF) probability of having value at the quantizer s output boundaries for the quantization interval PSNR can then be computed in the standard way: has to be estimated using the distorted (quantized) DCT coefficient data
H.264 DCT coefficients distribution Common models: An example zero-mean Laplace PDF P and B frames zero-mean Cauchy PDF I frames One distribution per spatial frequency
Estimating parameters from quantized data [1] Log-likelihood maximization is the probability of having value Linear prediction at the quantizer s output values for 4x4 DCT linear weights found through training neighbors of the value to predict Combining prediction and ML estimates ratio of coefficients quantized to 0 [1] T. Brandão and M.P. Queluz, No-reference PSNR estimation algorithm for H.264 encoded sequences, EUSIPCO 2008, Lausanne, Switzerland, August 2008.
Perceptual model Based on the spatio-temporal contrast sensitivity function (CSF) proposed by Kelly and Daly It models the HVS sensitivity to luminance changes, as a function of the spatial frequency,, and of the retinal velocity, Spatial frequency Retinal velocity
Distortion metric The local perceptual error is given by Perceptual The global distortion value for the whole video frame, D f, is model computed using L4 error pooling Received media Local error estimation Local error weighting Quality score Finally, the same pooling process is applied along time, to get a global distortion metric for the encoded video sequence
3 Subjective quality assessment
Selection of test material Selection of seven video sequences that cover a wide spatiotemporal activity range CIF format (352 288 pixels) 30 Hz frame rate 10 second duration each Sequences encoded with the JM H.264 software 4 different bit rates in the range 32 to 2048 kbit/s GOP-15 structure IBBPBBP. only the 4 4 transform size was allowed
Methodology Subjective tests have been performed in accordance with Rec. ITU-T P.910 The Degradation Category Rating method was followed For each trial, the viewer is presented a pair of video sequences Trial n Trial n+1 Ref. Impaired Ref. Impaired Voting Voting The observer judges the quality of the impaired video with respect to the reference, using a five grade scale (1-very annoying to 5-imperceptible) Main conditions: 22 observers; 20 min test duration; LCD display Resulting MOS and video sequences are available online at http://amalia.img.lx.it.pt/~tgsb/h264_test/.
4 Results
PSNR estimation PSNR along time PSNR by frame type I-frames P-frames B-frames
Quality scores A logistic function was used to map D g values into MOS values: Parameters a 0 to a 3 were computed through a non-linear least squares method, using half of the assessed video sequences as training set Metric s performance indicators Root mean square error 0.383 Pearson correlation coefficient 0.953 Spearman rank order coefficient 0.946 Outliers ratio 0.071
Conclusions A no-reference video quality assessment method was presented It comprises a local error estimation module, followed by an error weighting module based on a spatio-temporal CSF model Estimated MOS values are well correlated with the human perception of quality Future work To consider a more complete perceptual model (e.g., incorporating contrast masking) To deal with transmission errors (i.e., packet losses)
Thank you for your attention.