SURVEILLANCE VIDEO FOR MOBILE DEVICES

Similar documents
Semantic video analysis for adaptive content delivery and automatic description

Quality versus Intelligibility: Evaluating the Coding Trade-offs for American Sign Language Video

No-reference perceptual quality metric for H.264/AVC encoded video. Maria Paula Queluz

Homogeneous Transcoding of HEVC for bit rate reduction

AUDIOVISUAL COMMUNICATION

Express Letters. A Simple and Efficient Search Algorithm for Block-Matching Motion Estimation. Jianhua Lu and Ming L. Liou

Efficient MPEG-2 to H.264/AVC Intra Transcoding in Transform-domain

Advanced Video Coding: The new H.264 video compression standard

International Journal of Emerging Technology and Advanced Engineering Website: (ISSN , Volume 2, Issue 4, April 2012)

Object-Based Transcoding for Adaptable Video Content Delivery

JPEG 2000 vs. JPEG in MPEG Encoding

A Reversible Data Hiding Scheme for BTC- Compressed Images

H.264 to MPEG-4 Transcoding Using Block Type Information

ADAPTIVE PICTURE SLICING FOR DISTORTION-BASED CLASSIFICATION OF VIDEO PACKETS

2014 Summer School on MPEG/VCEG Video. Video Coding Concept

Advanced Encoding Features of the Sencore TXS Transcoder

Complexity Reduced Mode Selection of H.264/AVC Intra Coding

Secure Scalable Streaming and Secure Transcoding with JPEG-2000

CONTENT ADAPTIVE SCREEN IMAGE SCALING

STUDY AND IMPLEMENTATION OF VIDEO COMPRESSION STANDARDS (H.264/AVC, DIRAC)

Technical Recommendation S. 10/07: Source Encoding of High Definition Mobile TV Services

Multimedia Systems Video II (Video Coding) Mahdi Amiri April 2012 Sharif University of Technology

Combined Copyright Protection and Error Detection Scheme for H.264/AVC

Review and Implementation of DWT based Scalable Video Coding with Scalable Motion Coding.

VIDEO AND IMAGE PROCESSING USING DSP AND PFGA. Chapter 3: Video Processing

Comparative Study of Partial Closed-loop Versus Open-loop Motion Estimation for Coding of HDTV

Deblocking Filter Algorithm with Low Complexity for H.264 Video Coding

Optimal Video Adaptation and Skimming Using a Utility-Based Framework

FOR compressed video, due to motion prediction and

Reduced Frame Quantization in Video Coding

Implementation and analysis of Directional DCT in H.264

Cross-Layer Optimization for Efficient Delivery of Scalable Video over WiMAX Lung-Jen Wang 1, a *, Chiung-Yun Chang 2,b and Jen-Yi Huang 3,c

Outline Introduction MPEG-2 MPEG-4. Video Compression. Introduction to MPEG. Prof. Pratikgiri Goswami

A Framework for Video Streaming to Resource- Constrained Terminals

DIGITAL TELEVISION 1. DIGITAL VIDEO FUNDAMENTALS

Optimized Progressive Coding of Stereo Images Using Discrete Wavelet Transform

A Comparison of Still-Image Compression Standards Using Different Image Quality Metrics and Proposed Methods for Improving Lossy Image Quality

Reduction of Blocking artifacts in Compressed Medical Images

One-pass bitrate control for MPEG-4 Scalable Video Coding using ρ-domain

CHAPTER 4 REVERSIBLE IMAGE WATERMARKING USING BIT PLANE CODING AND LIFTING WAVELET TRANSFORM

Reducing/eliminating visual artifacts in HEVC by the deblocking filter.

Fast Region-of-Interest Transcoding for JPEG 2000 Images

A deblocking filter with two separate modes in block-based video coding

CODING METHOD FOR EMBEDDING AUDIO IN VIDEO STREAM. Harri Sorokin, Jari Koivusaari, Moncef Gabbouj, and Jarmo Takala

Recommended Readings

Complexity Reduction Tools for MPEG-2 to H.264 Video Transcoding

Chapter 11.3 MPEG-2. MPEG-2: For higher quality video at a bit-rate of more than 4 Mbps Defined seven profiles aimed at different applications:

Mesh Based Interpolative Coding (MBIC)

An Adaptive MPEG-4 Streaming System Based on Object Prioritisation

Digital Video Processing

WHITE PAPER ON2 TECHNOLOGIES, INC. TrueMotion VP7 Video Codec. January 10, 2005 Document Version: 1.0

MRT based Adaptive Transform Coder with Classified Vector Quantization (MATC-CVQ)

CONTENT ADAPTIVE COMPLEXITY REDUCTION SCHEME FOR QUALITY/FIDELITY SCALABLE HEVC

Contribution of CIWaM in JPEG2000 Quantization for Color Images

EE 5359 MULTIMEDIA PROCESSING SPRING Final Report IMPLEMENTATION AND ANALYSIS OF DIRECTIONAL DISCRETE COSINE TRANSFORM IN H.

Hybrid Video Compression Using Selective Keyframe Identification and Patch-Based Super-Resolution

A Hybrid Temporal-SNR Fine-Granular Scalability for Internet Video

Coding of 3D Videos based on Visual Discomfort

VIDEO DENOISING BASED ON ADAPTIVE TEMPORAL AVERAGING

Image Quality Assessment Techniques: An Overview

Low-Complexity, Near-Lossless Coding of Depth Maps from Kinect-Like Depth Cameras

An Adaptive Video Compression Technique for Resource Constraint Systems

Ferre, PL., Doufexi, A., Chung How, J. T. H., Nix, AR., & Bull, D. (2003). Link adaptation for video transmission over COFDM based WLANs.

Joint Impact of MPEG-2 Encoding Rate and ATM Cell Losses on Video Quality

VIDEO COMPRESSION STANDARDS

Module 6 STILL IMAGE COMPRESSION STANDARDS

JPEG 2000 A versatile image coding system for multimedia applications

Evaluating the Streaming of FGS Encoded Video with Rate Distortion Traces Institut Eurécom Technical Report RR June 2003

TRADITIONAL adaptation approaches transform the video

Context based optimal shape coding

Fast Wavelet-based Macro-block Selection Algorithm for H.264 Video Codec

Video Quality Analysis for H.264 Based on Human Visual System

New Techniques for Improved Video Coding

Integrating semantic analysis and scalable video coding for efficient content-based adaptation*

For layered video encoding, video sequence is encoded into a base layer bitstream and one (or more) enhancement layer bit-stream(s).

AUDIOVISUAL COMMUNICATION

Video Compression An Introduction

Testing HEVC model HM on objective and subjective way

Improved H.264/AVC Requantization Transcoding using Low-Complexity Interpolation Filters for 1/4-Pixel Motion Compensation

In the name of Allah. the compassionate, the merciful

An Implementation of Multiple Region-Of-Interest Models in H.264/AVC

Image Classification for JPEG Compression

Smart Video Transcoding Solution for Surveillance Applications. White Paper. AvidBeam Technologies 12/2/15

OPTIMIZATION OF LOW DELAY WAVELET VIDEO CODECS

Image Compression Algorithm and JPEG Standard

A Study of Image Compression Based Transmission Algorithm Using SPIHT for Low Bit Rate Application

User-Friendly Sharing System using Polynomials with Different Primes in Two Images

Video Adaptation: Concepts, Technologies, and Open Issues

Rate Distortion Optimization in Video Compression

Digital Image Processing

Video coding. Concepts and notations.

Cross-Layer Perceptual ARQ for H.264 Video Streaming over Wireless Networks

QUANTIZER DESIGN FOR EXPLOITING COMMON INFORMATION IN LAYERED CODING. Mehdi Salehifar, Tejaswi Nanjundaswamy, and Kenneth Rose

DCT-BASED IMAGE QUALITY ASSESSMENT FOR MOBILE SYSTEM. Jeoong Sung Park and Tokunbo Ogunfunmi

System Modeling and Implementation of MPEG-4. Encoder under Fine-Granular-Scalability Framework

Both LPC and CELP are used primarily for telephony applications and hence the compression of a speech signal.

Scalable Video Coding

Block-Matching based image compression

A COMPARATIVE STUDY OF QUALITY AND CONTENT-BASED SPATIAL POOLING STRATEGIES IN IMAGE QUALITY ASSESSMENT. Dogancan Temel and Ghassan AlRegib

Multidimensional Transcoding for Adaptive Video Streaming

Transcription:

SURVEILLANCE VIDEO FOR MOBILE DEVICES Olivier Steiger, Touradj Ebrahimi Signal Processing Institute Ecole Polytechnique Fédérale de Lausanne (EPFL) CH-1015 Lausanne, Switzerland {olivier.steiger,touradj.ebrahimi}@epfl.ch Andrea Cavallaro Multimedia and Vision Lab Queen Mary, University of London Mile End Road, London E1 4NS, UK andrea.cavallaro@elec.qmul.ac.uk ABSTRACT In this paper, we present a video encoding scheme that uses object-based adaptation to deliver surveillance video to mobile devices. The method relies on a set of complementary video adaptation strategies and generates content that matches various appliance and network resources. Prior to encoding, some of the adaptation strategies exploit video object segmentation and selective filtering in order to improve the perceived quality. Moreover, object segmentation enables the generation of automatic summaries and of simplified versions of the monitored scene. The performance of individual adaptation strategies is assessed using an objective video quality metric, which is also used to select the strategy that provides maximum value for the user under a given set of constraints. We demonstrate the effectiveness of the scheme on standard surveillance test sequences and realistic mobile client resource profiles. 1. INTRODUCTION The problem of remote visual surveillance of unattended environments has received growing attention in recent years. But whereas event monitoring is still mostly performed by human operators located in a fixed surveillance room, there is an increasing demand for the delivery of surveillance video to mobile devices as well. The latter notably enables surveillance personnel to monitor a critical situation without interruption even while shifting to the intervention place. Moreover, it permits the surveillance of private ground and vacant homes using cellular phones and PDAs. However, reliable remote surveillance requires the quality of the delivered video to be optimal despite the limitations resulting from small display sizes and restricted processing capabilities of mobile devices. In addition to this, one must cope with the limited bandwidth and time-varying conditions of wireless transmission channels. Traditionally, scalable video coding [1] and content-blind transcoding [2] have been used to adapt video to the restricted capabilities of mobile terminals and networks. However, scalable video coding requires specific decoding capabilities to access individual quality or resolution layers. Moreover, scalable video streams are not optimal in terms of the required bandwidth. The above problems are solved by using transcoding, where the video is adapted to the capabilities of the receiver at the encoder s side. However, traditional transcoding techniques (content-blind techniques) are generally not optimal in terms of perceptual quality. Thus, recent transcoding methods (content-based techniques) make use of content characteristics in order to minimize the degradation of important image regions. In particular, objectbased transcoding considers the usage of video objects as transcoding entities. That is, foreground objects are encoded at a higher quality level or resolution than less important regions [3, 4]. While the works in [3, 4] resort to object-based encoders (e.g., MPEG 4) to code different image regions individually, we present in this paper a method that exploits an object-based representation in a traditional frame-based encoding framework (e.g., MPEG 1). The rationale behind this choice is to enable the use of advanced functionalities with standard decoders available for consumer devices. Also, the additional knowledge provided by object-based analysis can further be exploited to meet the restricted capabilities of mobile devices. The remainder of this paper is organized as follows. In Section 2, we discuss the generation and delivery of video that matches the resources of mobile devices in an optimal way. In Section 3, results obtained with real surveillance sequences are presented and discussed. Finally, the conclusions of our work are drawn in Section 4. 2. DELIVERY OF SURVEILLANCE VIDEO Adaptive delivery ensures that the delivered video matches the limited capabilities of mobile appliances in an optimal way. This is achieved by transforming the video using a number of complementary adaptation strategies, and by selecting the strategy that provides most perceptual quality for the end user.

(a) (c) (d) (e) Fig. 1. Background simplification for compression improvement and objects enhancement. (a) The original background from the Highway sequence is replaced by a static background shot. The original background is lowpass-filtered. (c) Each background macroblock is replaced by its DC value. (d) An edge image is used instead of the original background. (e) Background areas are set to a constant value. 2.1. Video adaptation strategies The uncompressed input video is first transformed using one out of the following adaptation strategies: coded original; spatial resolution reduction; semantic prefiltering. The coded original is simply obtained by encoding the input video using a frame-based encoder, such as MPEG 1. Spatial resolution reduction can further be applied prior to the coding in order to reduce the transmission bandwidth. Semantic prefiltering aims at mimicking the way humans treat visual information in order to improve the compression ratio of image and video coders, and to enhance relevant objects [5]. To enable semantic prefiltering, image areas that observers are looking at (foreground) need to be separated from areas that are not expected to attract the attention of a viewer (background) by means of video object segmentation [6]. The overall image quality is then improved by simplifying the background in order to improve the quality (i.e. the associated bit allocation) of the foreground. This is achieved by replacing the original background by a static background shot, by lowpass-filtering the background, or by replacing each background macroblocks by its DC value (Fig. 1(a)-(c)). Alternatively, superfluous visual details may be removed from the background to enhance relevant objects (Fig. 1(d)-(e)). (a) Fig. 2. Use of object segmentation to meet restricted device capabilities. (a) Relevant objects are put in a conspicuous situation on small displays. Surveillance video is summarized in a single image. The additional knowledge provided by object segmentation can further be exploited to meet the restricted capabilities of mobile devices. In Fig. 2(a), relevant video objects have been put in a conspicuous situation by means of colored blobs. This is particularly useful on small displays. In Fig. 2, the surveillance video has been summarized in a single frame by plotting the trajectories of semantic video objects on top of a static background shot. This can be used to convey the meaning of the filmed scene when video capabilities are not available. The perceptual quality resulting from the individual adaptation strategies is then evaluated by means of objective evaluation. An objective video distortion measure that emulates human judgement needs to account for different image areas and for their relevance to the observer. This aspect can be considered with the traditional Mean Squared Error (MSE) by weighting different image areas according to their semantics. This leads to the semantic mean squared error, SMSE, defined as [5]: SMSE = N k=1 w k C k (i,j) C k d 2 (i, j), (1) where N is the number of classes and w k the weight of class k. Class weights are chosen depending on the semantics, with w k 0, k =1,...,N and N i=1 w k =1. C k is the set of pixels belonging to the object class k, and C k is its cardinality. The error d(i, j) between the original image I O and the distorted image I D in Eq. (1) is the pixel-wise color distance. The color distance is computed in the 1976 CIE Lab color space in order to consider perceptually uniform color distances with the Euclidean norm. The final quality evaluation metric, the semantic peak signal-to-noise ratio SPSNR, uses SMSE instead of MSE as compared to PSNR. When the classes are foreground and background, then N =2in Eq. (1), and w f is the foreground weight. The background weight is thus (1 w f ).

2.2. Strategy selection 40 UMTS ADSL 40 UMTS ADSL Strategy selection is at last needed to work out the adaptation strategy that provides most perceptual quality for the end user, considering the individual resources of the connected client (i.e., appliance, network). Specifically, let A i be some original item, e.g., a video. The adapted version M ijk is computed by transcoding A i using the adaptation operator O j at resources k. Each adaptation operator O j implements an adaptation strategy in Section 2.1. The perceptual quality of M ijk resulting from the adaptation is denoted by Q(M ijk ). Let us furthermore define the item resource vector for the item M ijk as R(M ijk )= ( R(Mijk ) 1,R(M ijk ) 2,...,R(M ijk ) r) T, where r is the number of different resources that have to be considered (e.g., bitrate, resolution, coding format, etc.). Similarly, the client resource vector is denoted by = ( R 1 client,r2 client,...,rr client) T. The selection of the optimal adaptation strategy can then be formalized by the following resource allocation problem: max j,k R n (M ijk ) R n client for all 1 n r In order to solve Problem 1, we define a number of anchor nodes ( O j, R(M ijk ),V(M ijk ) ). An anchor node expresses the quality Q(M ijk ) resulting from applying adaptation operator O j at resources R(M ijk ). We further fit a polynomial quality function (QF) to the anchor nodes of each adaptation operator. The quality function matrix for A i is denoted as a i1,1 a i1,2... a i1,p ( ) F QF i = f QF i1, f QF QF T a i2,1 a i2,2... a i2,p i2,...,fij =........, a ij,1 a ij,2... a ij,p (3) where a ij,k are the coefficients of the order p 1 polynomial quality function, f QF ij. J is the number of adaptation operators. The solution to Problem 1 is { then given by the QF that has maximum quality max j,r f QF ij (R) } such that R. In the particular case where all QFs are monotonically increasing, the solution is located at R =. 3. RESULTS In this section, the proposed adaptive delivery framework is tested with surveillance sequences and realistic client resource profiles. In particular, the mechanism discussed in SPSNR [db] 35 30 25 20 15 (1) Coded original (2) Spatial resolution reduction (3a) Lowpass filtered background (3b) Static background 200 300 400 500 600 700 800 900 1000 Bitrate [Kbit/s] (a) SPSNR [db] 35 30 25 20 15 (1) Coded original (2) Spatial resolution reduction (3a) Lowpass filtered background (3b) Static background 200 300 400 500 600 700 800 900 1000 Bitrate [Kbit/s] Fig. 3. Rate-distortion diagrams for strategy selection. Anchor nodes are represented along with the corresponding cubic polynomial value functions. The client resources under analysis are highlighted using vertical lines. (a) Hall monitor. Highway. Section 2.2 is used to select the adaptation strategy that provides most quality for the end user. The results are verified by visual inspection and by objective quality evaluation using SPSNR. Problem 1 For item A i, find the adapted version M ijk that The test sequences are Hall Monitor from the MPEG has maximum quality Q(M ijk ) such that item resources R(M ijk ) 4 test sequences, and Highway from the MPEG 7 test sequences. Both sequences are in CIF format at 25 Hz; the do not exceed client resources : { } length is 300 frames. In our experiments, the following Q(M ijk ) such that (2) frame-based adaptation strategies are compared: (a) coded original sequence; spatial resolution reduction; (c) semantic prefiltering with lowpass-filtering; (d) semantic prefiltering with static background. A single resource, i.e. bitrate, is considered. In order to assess the performance of the selection mechanism both for low-quality and for highquality video, the bitrate of the client has been set to R UMTS client = 176 Kbit/s and to Rclient ADSL = 1000 Kbit/s. The former corresponds to the bandwidth supported by the UMTS multimedia protocol. The latter is sometimes used for video streaming over asymmetric digital subscriber lines (ADSL). The anchor nodes have been calculated for the following bitrates: 150, 200, 250, 300, 500 and 1000 Kbit/s. In the rate-distortion diagrams in Fig. 3, each data point represents one anchor node. A cubic function is further fit to the anchor nodes of each adaptation strategy. For Hall monitor, evaluating the value function at 176 Kbit/s leads to the following maximal SPSNR: 27.4 db for coded original (a); 22.6 db for spatial resolution reduction ; 25.3 db for semantic prefiltering with lowpassfiltering (c); 29.4 db for semantic prefiltering with static background (d). Thus, according to Eq. (2), the adaptation strategy that provides most perceived quality for the end user is semantic prefiltering with static background (d). At 1000 Kbit/s, the maximal SPSNR is: 35 db for the coded original; 22.8 db for spatial resolution reduction; 27.2 db for semantic prefiltering with lowpassfiltering; 35.6 db for semantic prefiltering with static back-

ground. Thus, the selected adaptation strategy is semantic prefiltering with static background (d) as well. For Highway, the resource allocation problem is solved in a similar way. The adaptation strategies that provide most quality are found to be semantic prefiltering with static background (d) at 176 Kbit/s, and the coded original sequence (a) at 1000 Kbit/s. These results are next verified by visual inspection and by objective quality evaluation. The former is done by inspecting sample frames from sequences coded using different strategies. The latter is achieved by measuring SPSNR at 176 Kbit/s and at 1000 Kbit/s for each strategy. In Fig. 4, sample frames are shown for the sequence Hall monitor. At 176 Kbit/s (left column), the person s face and the monitor have slightly more details with the semantic strategies (c) and (d) than with the non-semantic strategies (a) and. Also, the background is severely corrupted by coding artifacts in the coded original (a). This is particularly visible on background edges. At 1000 Kbit/s (right column), spatial resolution reduction and lowpass-filtered background (c) have substantially lower quality than the coded original (a) and static background (d). On the other hand, it is difficult to perceive differences between the coded original (a) and static background (d). In Fig. 5, sample frames are shown for the sequence Highway. At 176 Kbit/s, the background of semantic prefiltering with static background (d) has higher quality than the background of the coded original (a). In particular, the white painted lines on the road are sharper with static background (d). At 1000 Kbit/s however, the shadow cast by the truck stops in an unnatural way in static background (d). These artificial boundaries result from the object segmentation process used by the semantic prefiltering step. These boundaries are visually annoying and lead to a lower perceptual quality for static background (d) than for the coded original (a). The SPSNR for the two test sequences coded at 176 Kbit/s and at 1000 Kbit/s using different adaptation strategies is given in Table 1. As expected, the highest objective quality for Hall monitor is achieved by using semantic prefiltering with static background at both 176 Kbit/s and 1000 Kbit/s. For Highway, the highest SPSNR obtained by using semantic prefiltering with static background at 176 Kbit/s, and by the coded original at 1000 Kbit/s. 4. CONCLUSIONS We presented a video encoding scheme that uses object segmentation based on motion to increase the perceived quality of surveillance video as well as to meet the restricted capabilities of mobile devices. The scheme is used to select among different adaptation strategies in realistic content delivery situations and has been demonstrated on surveillance test sequences. Both visual inspection and objective quality BITRATE 176 Kbit/s 1000 Kbit/s Hall monitor Coded original 27.5 db 35.0 db Spatial resolution reduction 22.7 db 22.8 db Lowpass-filtered background 25.3 db 27.2 db Static background 29.4 db 35.6 db Highway Coded original 29.0 db 35.8 db Spatial resolution reduction 23.4 db 23.5 db Lowpass-filtered background 27.7 db 30.1 db Static background 29.8 db 35.1 db Table 1. SPSNR for the sequences Hall monitor and Highway coded at 176 Kbit/s and at 1000 Kbit/s using different adaptation strategies. evaluation results confirm that the adaptive delivery framework described in this paper is capable to determine the adaptation strategy that leads to the best perceptual video quality. In fact, the strategies that have been selected for delivery have also the highest SPSNR in all tested cases. As part of our future work, we would like to point out that the discussed method requires quality to be computed explicitly for each candidate strategy. Such calculations are time-consuming and need at the moment to be performed offline. A solution to this problem is quality function prediction, where the quality is estimated for each strategy based on content features instead of being actually computed. 5. REFERENCES [1] Yao Wang, Jörn Ostermann, and Ya-Qin Zhang, Video Processing and Communications, Prentice Hall, Upper Saddle River, USA, 2001. [2] Anthony Vetro, Charilaos Christopoulos, and Huifang Sun, Video transcoding architectures and techniques: an overview, IEEE Signal Processing Magazine, vol. 20, no. 2, pp. 18 29, March 2003. [3] Anthony Vetro, Huifang Sun, and Yao Wang, Object-based transcoding for adaptable video content delivery, IEEE Transactions on Circuits and Systems for Video Technology, vol. 11, no. 3, pp. 387 401, March 2001. [4] R. Cucchiara, C. Grana, and A. Prati, Semantic video transcoding using classes of relevance, International Journal of Image and Graphics, vol. 3, no. 1, pp. 145 169, January 2003. [5] Andrea Cavallaro, Olivier Steiger, and Touradj Ebrahimi, Semantic video analysis for adaptive content delivery and automatic description, IEEE Transactions on Circuits and Systems for Video Technology, 2005 (to appear). [6] Andrea Cavallaro, Olivier Steiger, and Touradj Ebrahimi, Tracking video objects in cluttered background, IEEE Transactions on Circuits and Systems for Video Technology, vol. 15, no. 4, April 2005.

(a) (c) Fig. 4. Frame 190 from Hall monitor for different adaptation strategies. The coding bitrates are: (left column) 176 Kbit/s; (right column) 1000 Kbit/s. The strategies under analysis are: (a) Coded original. Spatial resolution reduction. (c) Lowpass-filtered background. (d) Static background. (d)

(a) (c) Fig. 5. Frame 20 from Highway for different adaptation strategies. The coding bitrates are: (left column) 176 Kbit/s; (right column) 1000 Kbit/s. The strategies under analysis are: (a) Coded original. Spatial resolution reduction. (c) Lowpassfiltered background. (d) Static background. (d)