OPTIMAL VIDEO SUMMARY GENERATION AND ENCODING. (ICIP Draft v0.2, )

Similar documents
A Fast Content-Based Multimedia Retrieval Technique Using Compressed Data

Combined Rate Control and Mode Decision Optimization for MPEG-2 Transcoding with Spatial Resolution Reduction

Cluster Analysis of Electrical Behavior

A Binarization Algorithm specialized on Document Images and Photos

Parallelism for Nested Loops with Non-uniform and Flow Dependences

NUMERICAL SOLVING OPTIMAL CONTROL PROBLEMS BY THE METHOD OF VARIATIONS

Improvement of Spatial Resolution Using BlockMatching Based Motion Estimation and Frame. Integration

Feature Reduction and Selection

Video Content Representation using Optimal Extraction of Frames and Scenes

Support Vector Machines. CS534 - Machine Learning

ALGORITHM FOR H.264/AVC

End-to-end Distortion Estimation for RD-based Robust Delivery of Pre-compressed Video

Efficient Video Coding with R-D Constrained Quadtree Segmentation

CS 534: Computer Vision Model Fitting

Subspace clustering. Clustering. Fundamental to all clustering techniques is the choice of distance measure between data points;

Simulation Based Analysis of FAST TCP using OMNET++

Tsinghua University at TAC 2009: Summarizing Multi-documents by Information Distance

Performance Evaluation of Information Retrieval Systems

An Image Compression Algorithm based on Wavelet Transform and LZW

An Improved Image Segmentation Algorithm Based on the Otsu Method

Kent State University CS 4/ Design and Analysis of Algorithms. Dept. of Math & Computer Science LECT-16. Dynamic Programming

Classification Based Mode Decisions for Video over Networks

Determining the Optimal Bandwidth Based on Multi-criterion Fusion

Compiler Design. Spring Register Allocation. Sample Exercises and Solutions. Prof. Pedro C. Diniz

Efficient Content Representation in MPEG Video Databases

Lobachevsky State University of Nizhni Novgorod. Polyhedron. Quick Start Guide

Wavelet-Based Image Compression System with Linear Distortion Control

Optimal Scheduling of Capture Times in a Multiple Capture Imaging System

Module Management Tool in Software Development Organizations

Positive Semi-definite Programming Localization in Wireless Sensor Networks

y and the total sum of

Active Contours/Snakes

Discrete Cosine Transform Optimization in Image Compression Based on Genetic Algorithm

Enhanced AMBTC for Image Compression using Block Classification and Interpolation

GSLM Operations Research II Fall 13/14

Fitting: Deformable contours April 26 th, 2018

Learning the Kernel Parameters in Kernel Minimum Distance Classifier

Improved H.264 Rate Control by Enhanced MAD-Based Frame Complexity Prediction

SAO: A Stream Index for Answering Linear Optimization Queries

Sum of Linear and Fractional Multiobjective Programming Problem under Fuzzy Rules Constraints

Classifier Selection Based on Data Complexity Measures *

Intra-Parametric Analysis of a Fuzzy MOLP

6.854 Advanced Algorithms Petar Maymounkov Problem Set 11 (November 23, 2005) With: Benjamin Rossman, Oren Weimann, and Pouya Kheradpour

Recognizing Faces. Outline

2 optmal per-pxel estmate () whch we had proposed for non-scalable vdeo codng [5] [6]. The extended s shown to accurately account for both temporal an

Support Vector Machines

An Attention Based Method For Motion Detection And Estimation

NAG Fortran Library Chapter Introduction. G10 Smoothing in Statistics

Mathematics 256 a course in differential equations for engineering students

The Greedy Method. Outline and Reading. Change Money Problem. Greedy Algorithms. Applications of the Greedy Strategy. The Greedy Method Technique

Content Based Image Retrieval Using 2-D Discrete Wavelet with Texture Feature with Different Classifiers

Fast Intra- and Inter-Prediction Mode Decision in H.264 Advanced Video Coding

Outline. Discriminative classifiers for image recognition. Where in the World? A nearest neighbor recognition example 4/14/2011. CS 376 Lecture 22 1

A DCVS Reconstruction Algorithm for Mine Video Monitoring Image Based on Block Classification

Related-Mode Attacks on CTR Encryption Mode

Network Coding as a Dynamical System

An Optimal Algorithm for Prufer Codes *

Quality Improvement Algorithm for Tetrahedral Mesh Based on Optimal Delaunay Triangulation

An Optimal Bandwidth Allocation and Data Droppage Scheme for Differentiated Services in a Wireless Network

BITRATE ALLOCATION FOR MULTIPLE VIDEO STREAMS AT COMPETITIVE EQUILIBRIA

Hybrid Non-Blind Color Image Watermarking

A mathematical programming approach to the analysis, design and scheduling of offshore oilfields

Virtual Memory. Background. No. 10. Virtual Memory: concept. Logical Memory Space (review) Demand Paging(1) Virtual Memory

Fast Intra- and Inter-Prediction Mode Decision in H.264 Advanced Video Coding

Reducing Frame Rate for Object Tracking

Problem Definitions and Evaluation Criteria for Computational Expensive Optimization

Programming in Fortran 90 : 2017/2018

Course Introduction. Algorithm 8/31/2017. COSC 320 Advanced Data Structures and Algorithms. COSC 320 Advanced Data Structures and Algorithms

Steps for Computing the Dissimilarity, Entropy, Herfindahl-Hirschman and. Accessibility (Gravity with Competition) Indices

SLAM Summer School 2006 Practical 2: SLAM using Monocular Vision

An Image Fusion Approach Based on Segmentation Region

TN348: Openlab Module - Colocalization

LOOP ANALYSIS. The second systematic technique to determine all currents and voltages in a circuit

Analysis of Continuous Beams in General

ARTICLE IN PRESS. Signal Processing: Image Communication

High-Boost Mesh Filtering for 3-D Shape Enhancement

Private Information Retrieval (PIR)

A New Approach For the Ranking of Fuzzy Sets With Different Heights

Optimal Workload-based Weighted Wavelet Synopses

What are the camera parameters? Where are the light sources? What is the mapping from radiance to pixel color? Want to solve for 3D geometry

Lecture 13: High-dimensional Images

Dynamic Code Block Size for JPEG 2000

12/2/2009. Announcements. Parametric / Non-parametric. Case-Based Reasoning. Nearest-Neighbor on Images. Nearest-Neighbor Classification

Rate-Complexity Scalable Multi-view Image Coding with Adaptive Disparity-Compensated Wavelet Lifting

Object-Based Techniques for Image Retrieval

On the Efficiency of Swap-Based Clustering

A Robust Method for Estimating the Fundamental Matrix

A MOVING MESH APPROACH FOR SIMULATION BUDGET ALLOCATION ON CONTINUOUS DOMAINS

Assignment # 2. Farrukh Jabeen Algorithms 510 Assignment #2 Due Date: June 15, 2009.

A Background Subtraction for a Vision-based User Interface *

Face Recognition University at Buffalo CSE666 Lecture Slides Resources:

Repeater Insertion for Two-Terminal Nets in Three-Dimensional Integrated Circuits

Hierarchical clustering for gene expression data analysis

Network Intrusion Detection Based on PSO-SVM

Skew Angle Estimation and Correction of Hand Written, Textual and Large areas of Non-Textual Document Images: A Novel Approach

Shape-adaptive DCT and Its Application in Region-based Image Coding

Query Clustering Using a Hybrid Query Similarity Measure

7/12/2016. GROUP ANALYSIS Martin M. Monti UCLA Psychology AGGREGATING MULTIPLE SUBJECTS VARIANCE AT THE GROUP LEVEL

Range images. Range image registration. Examples of sampling patterns. Range images and range surfaces

3. CR parameters and Multi-Objective Fitness Function

Transcription:

OPTIMAL VIDEO SUMMARY GENERATION AND ENCODING + Zhu L, * Aggelos atsaggelos and + Bhavan Gandh (ICIP Draft v.2, -2-23) + Multmeda Communcaton Research Lab, Motorola Labs, Schaumburg * Department of Electrcal & Computer Engneerng, Northwestern Unversty, Evanston ABSTRACT Vdeo summary work orgnates from vew tme constrant; a shorter verson of orgnal vdeo sequence s desrable n some applcatons. Our work s based on vsual sgnfcance analyss, whch s a functon of vsual features of nterests over tme. Once the frames n a sequence are labeled wth vsual sgnfcance, one-pass or two-pass frame selecton algorthms are proposed to generate perceptually optmal vdeo summary accordng to the vsual sgnfcance functon. We also proposed several optmal bt allocaton strategy for encodng of the vdeo summary.. INTRODUCTION The demand for vdeo summary work orgnates from securty, mltary and entertanment applcatons. Vew tme constrant s essental n many applcatons. In mltary stuaton, a battalon commander may request a 5 mnutes summary of Company B s fghtng at mountan pass for the last 2 mnutes; or n a securty applcaton, a supervsor want to see a 2 mnutes summary of what happened at arport gate B2, from camera #22, n the last mnutes; or n an entertanment scenaro a vewer ust want to spend hour to vew a.5 hour move. A shorter verson of the orgnal vdeo sequence also has the beneft of requrng less bts to encode. Ths makes t also attractve as a mechansm n rate control. Examples of recent works n vdeo summary are n [] [2][3][4]. People have used varous vsual features and the statstcs of vsual features to dentfy vdeo shot boundares and determne key frames by thresholdng and clusterng. Generally they are very complcated n computaton and requre a two-pass approach. They do not address the temporal resoluton wthn a vdeo shot gracefully. Our approach s based on vsual sgnfcance analyss. Vsual sgnfcance s a real functon defned for each frame to ndcate how vsually mport a frame s for user to comprehend the event n a vdeo sequence. We obtan our vsual sgnfcance value from color layout changes from the prevous frame and moton actvty wthn the frame. Wth some predetermned threshold T on vsual sgnfcance, we can dvde vdeo sequence up nto vdeo shots and pck the very frst frame as key frame. Wthn each vdeo shot, a cumulatve vsual sgnfcance functon s computed from vsual sgnfcance. Then a step sze s pcked to select the temporal enhancement frames. Note that the vsual sgnfcance and cumulatve vsual sgnfcance functons are causal functons. The paper s organzed nto the followng sectons: In secton 2, vsual sgnfcance analyss; secton 3, vdeo summary frame selecton, one-pass approach; secton 4, vdeo summary generaton, two-pass approach, and n secton 5, optmal codng. 2. VISUAL SIGNIFICANCE ANALYSIS Vsual Sgnfcance (VS) functon characterzes the mportance of frames n the orgnal vdeo sequence n helpng users understand the events. VS for frame s computed as weghted sum of color changes and moton: CLD VS wvs + w2 = VS () MAD where color change VS s the L2 dstance between two MPEG-7 [5] defned Color Layout Descrptors (CLD) [6] of frames and -: VS CLD = L Dst( CLD, CLD ) (2) 2 n whch CLD for frame s the 8x8 DCT transform coeffcents of the 8x8 pxel thumbnal mage downsampled from the orgnal frame. DCT n effect performs a Prncple Component Analyss (PCA) to capture most energy of the frame, whle the L2 dstance provdes metrc for how much has changed from frame to frame.

Moton actvty VS s based on MPEG-7 s Moton Actvty Descrptor (MAD) [7]: VS MAD VAR ( MV = (3) whch s the varance of the magntude of the moton vectors (MV) n frame. Note that f frame s a scene change frame, then the moton VS s set to zero. Examples of vsual sgnfcance analyss are shown n fgure..8.6.4.2 5 5 2 25 3 35 4 vsual sgnfcance : bond sequence 6 4 2 ) vsual sgnfcance : foreman sequence 5 5 2 25 3 35 4 Fgure. Vsual Sgnfcance Analyss Examples When compares the VS plot wth the actual vdeo sequence, t s clear that VS captures the mportance of frames well. For example, n foreman sequence, spkes before frame 2 correspond to the head shakng and spkes around frame 25 correspond to the hand wavng and around 3 s the camera pannng. In another sequence bond, whch contans several scene cut and hgh moton actvty scenes, VS captures them accurately as well. Note that the moton VS s not requred n vsual sgnfcance analyss, snce the CLD based VS captures most of the nformaton and s computatonally much smpler. We fnd only n dense moton sequences lke soccer, where there are not much color layout change, MAD based VS wll be needed. By a thresholdng operaton on VS functons, orgnal vdeo sequence s chopped up nto vdeo shots,.e. the perceptually consstent groups of vdeo frames. In our experment we pcked threshold T = 2. for VS functons. Then the foreman sequence s a sngle shot and the bond sequence s broken nto 6 shots. Note that T can be changed to sut dfferent applcatons. Wthn a vdeo shot, the Cumulatve Vsual Sgnfcance (CVS) s computed as the summaton of VS of frames: CVS VS, f : VS < T = = n, n =, f : VS T In (4), n s the last key frame, T s the threshold. (4) Examples of CVS functons are shown n fgure 2 for foreman and bond sequences. 6 4 2 cumulatve vsual sgnfcance : foreman seq 5 5 2 25 3 35 4 5 4 3 2 cumulatve vsual sgnfcance : bond seq 5 5 2 25 3 35 4 Fgure 2. Cumulatve Vsual Sgnfcance Analyss Note that the slope of CVS functon ndcates the rate of vsual sgnfcance at frame tme. The steeper the slope the more nformaton s contaned. We wll desgn our frame selecton mechansm accordngly n the next secton. 3. FRAME SELECTION: ONE PASS APPROACH The VS and CVS functon gve us an approxmaton of how vsually sgnfcant events or nformaton s dstrbuted among frames. Wth ths nformaton, vdeo sequence s break up nto vdeo shots, and for each vdeo shot, a vdeo summary conssts of a key frame and multple temporal enhancement frames s selected. The overall operaton s llustrated n the psuedo code lsted below, n whch CVS[] s the cumulatve VS value for frame, and F_SEL[] s the frame selecton ndcator: V = ; = ; Whle ( < n) { If ( CVS[] == )

} F_SEL[] = ; #key frame V = ; Else f (CVS[]-V > ) F_SEL[] = E ; #enh frame V = CVS[]; Else F_SEL[] = ; #skp The start of a vdeo shot s dentfed by threshold on VS functon. The frst frame n a vdeo shot s always pcked as key frame for the shot, note that key frame here may have dfferent meanng from some prevous works. The total number of key frames of a vdeo sequence s determned by threshold T, pre-determned by experment. A steppng operaton s performed on CVS functon to select temporal enhancement frames for the shot. The ncrement of CVS from last frame selecton tme s computed, f t s greater than the step sze, then the frame s selected as a temporal enhancement frame. Example of ths operaton of vdeo summary frame selecton for foreman and bond sequences are llustrated n Fgure 3 and Fgure 4: 5 45 4 35 3 25 2 5 cumulatve vsual sgnfcance 5 enh frame key frame Threshold T = 2. Stepsze Delta = 2. Frame Selecton : "Bond" Sequence 5 5 2 25 3 35 4 Fgure 3. Vdeo Summary Frame Selecton For Bond 6 5 4 3 2 enh frame key frame Threshold T=2. Stepsze Delta = 2. cumulatve vsual sgnfcance Frame Selecton : "Foreman" Sequence 5 5 2 25 3 35 4 Fgure 4. Vdeo Summary Frame Selecton For Foreman Note that more frames are selected on steeper slopes of CVS functon. Ths s reasonable snce more nformaton s conveyed at those nstances. For a one-pass soluton, an ntal wll be pcked by the system, and accordng to vew-tme and bandwdth constrants, t can be adusted on the fly. 4. FRAME SELECTION: TWO-PASS APPROACH For a two-pass soluton, we have the luxury of fully analyze the sequence before generatng ts summary. For a gven sequence of N frames, f we want to reduce t to a vdeo summary of M frames, then the step sze can be determned precsely as the total vsual sgnfcance conveyed dvded by total frames avalable for temporal enhancement: CVSn = M = (5) where s total number of shots n the sequence, thus we wll have key frames and M- total enhancement frames; and n s the last frame n vdeo shot. Note that can also serve as a temporal dstorton metrc for the vdeo summary. 5. OPTIMAL CODING OF THE VIDEO SUMMARY For any gven vdeo coder, the optmal codng of vdeo summary becomes a frame level optmal bt allocaton problem. Ths needs accurate modelng of frame level Rate-Dstorton functon. Attempts to model the rate

dstorton curve of vdeo coder are lsted n [8][9]. However these approaches suffered from naccuracy when try to employ them n real vdeo coder. In [] a numercal soluton s proposed, the computaton cost nvolved s qute hgh. We try to solve ths problem wth a compromse of pure model based approach and operatonal rate-dstorton optmzaton approach. An analytcal dstorton model s assumed for both Intra and Inter frames, wth parameters provdng extra freedom to ft the actual operatonal Rate-Dstorton (R- D) curve of the chosen coder. For ntra frames, we assume: d = f b : X ) (6) ( and for nter frames, we assume: d = g b ; Y ) (7) ( where the dstorton of ntra frame or s a functon of bts spent b and codng complexty parameter vector X and Y. Functons f and g are convex functons and can be nverse proportonal or exponental type. Parameter vectors X and Y are solved from encodng an nter and ntra frame wth dfferent QPs. A recent work [] showed that a fast R-D operaton pont estmaton s possble by computng the rato of zeros n the transform coeffcents. Note also that the actvty of codng complexty parameter vectors X and Y are strongly correlated to that of the VS functon. So we only update X and Y after VS actvty s above certan threshold. Then we formulate the optmal codng problem as: arg mn f ( b ; X { b, b } M ) + = = sub. to : b + b = B = = M g( b ; Y ) (8) whch mnmze the average dstorton among vdeo summary frames wth bt budget constrant B. Snce functons f and g are convex and dfferentable, by ntroducng Lagrangan Multpler we can reduce (8) nto an un-constraned problem of mnmzng J: + [ f ( b ; X ) λb ] J ( λ) = + M [ g( b ; Y ) λb ] + = = (9) To satsfy frst order requrement, set the dervatve of J to zero: J b J b = f ' ( b ; X ' = g ( b ; Y ) + λ = ) + λ = () Solve () and wth total bt budget constrant n (8) we can fnd the optmal bt allocaton {b, b }. If constant dstorton s desred, an alternatve formulaton s for a gven bt budget B, fnd the mnmum constant dstorton d and the optmal bt allocaton {b, b } that wll meet the bt budget: arg sub. to : d () M f ( d ; X ) + = = g ( d ; Y ) = B Ths can be acheved by b-secton searchng. Work s underway to mplement ths method wth H.263 reference software TMN8 [2]. Varous models and codng complexty parameter estmaton methods are under nvestgaton. 6. SPATIAL AND TEMPORAL DISTORTION TRADE OFF When encodng a sequence of pctures, we need to consder both temporal and spatal dstortons. If we defne as a temporal dstorton metrc and average MSE dstorton D as the spatal dstorton metrc for a vdeo sequence, then the Rate-Dstorton functon becomes a convex surface defned on temporal and spatal dstorton axs: R h(, D) = (2) For any gven R=B, admssble (, D) pars are on a curve n -D plane. To fnd an optmal soluton a utlty functon for perceptual qualty s defned as: Q = (3) q(, D)

Whch s concave over -D plane. An optmal soluton can be found by solvng the constraned optmzaton problem: arg max q(, D), sub. to : h(, D) = B {, D} (4) Once s pcked by solvng (4), along wth VS functon threshold T, we can select frames to the optmal vdeo summary, and by solvng () and bt budget constrant (8) we can fnd the optmal bt allocaton among vdeo summary frames. Work s underway to fnd parameterzed analytc forms of functons h and q. For q, subectve evaluaton experment need to be set up. 7. EXPERIMENTAL RESULTS We encoded vdeo summary of foreman and bond sequences wth fxed threshold T=2., and varous step sze. It shows that the vdeo summary gracefully degrades the perceptual qualty wth the ncreasng of. Ths observaton s subectve, but an analytcal explanaton can be found from Fgure 3 and Fgure 4. For any gven, we always pck up enhancement frames for summary after a fxed amount of vsual nformaton, or sgnfcant events are conveyed. Ths ensures the temporal smoothness of the vdeo summary as compared wth some prevous clusterng approach. It s dffcult to fnd an obectve temporal dstorton measurement, but a plot of bts spent encodng the sequence as a functon of may demonstrate some ntutve clue, as shown n Fgure 5: Some sample vdeo summares wth dfferent are avalable on the web for evaluaton at: http://www.ece.northwestern.edu/~zl/research/cme3/d emo.html. 7. CONCLUSION AND FUTURE WORS In ths paper we demonstrated a vsual sgnfcance analyss based vdeo summary generaton method. It s computatonally smple and can operate n one-pass and two-pass scenaros. The summary generated by ths method acheves the graceful degradaton of perceptual qualty wth vew tme reducton and can be used n a varety of applcatons n securty, mltary and entertanment stuatons. Work s underway to fnd optmal encodng strategy for vdeo summary; good compromse between temporal resoluton and spatal PSNR qualty; subectve/obectve metrcs for vdeo summary qualty evaluaton. 7. REFERENCES [] Y. Wang, Z. Lu and J-C. Huang, Multmeda Content Analyss, IEEE Sgnal Processng Magazne, vol. 7, November 2. [2] H. Sundaram and S-F. Chang, Constraned Utlty Maxmzaton for Generatng Vsual Skms, IEEE Workshop on Content-Based Access of Image & Vdeo Lbrary, 2. [3] A. Grgenshohn and J. Boreczky, Tme-Constraned ey frame Selecton Technque, Proc. of IEEE Multmeda Computng and Systems (ICMCS), 999. bts 6 4 2 8 6 bts expendture as functon of delta: bond sequence [4] Y. Gong and X. Lu, Vdeo Summarzaton wth Mnmal Vsual Content Redundances, Proc. of Int l Conference on Image Processng, 2. [5], Informaton Technology Multmeda Content Descrpton Interface Part 3: Vsual, ISO/IEC FCD 5938-3. [6] B. S. Manunath, J-R. Ohm, V. V. Vasudevan and A. Yamada, Color and Texture Descrptors, IEEE Trans. on Crcuts and Systems for Vdeo Technology, vol., June 2. [7] S. Jeannn and A. Dvakaran, MPEG-7 Vsual Moton Descrptors, IEEE Trans. on Crcuts and Systems for Vdeo Technology, vol., June 2. 4.5.5 2 2.5 3 3.5 4 delta Fgure 5. Bts Functon of, Bond Sequence [8] T. Chang and Y-Q. Zhang, A New Rate Control Scheme Usng Quadratc Rate Dstorton Model, IEEE Trans. on Crcuts and Systems for Vdeo Technology, vol.7, February 997.

[9] H-M. Hang and J-J. Chen, Source Model for Transform Vdeo Coder and Its Applcaton Part I: Fundamental Theory, IEEE Trans. on Crcuts and Systems for Vdeo Technology, vol.7, Aprl 997. [] L-J. Ln and A. Ortega, Bt-Rate Control Usng Pecewse Approxmaton Rate-Dstorton Characterstcs, IEEE Trans. on Crcuts and Systems for Vdeo Technology, vol.8, August 998. [] Z. He, J. Ca and C-W. Chen, Jont Source Channel Rate- Dstorton Analyss for Adaptve Mode Selecton and Rate Control n Wreless Vdeo Codng, IEEE Trans. on Crcuts and Systems for Vdeo Technology, vol.2, June 22. [2] Unversty of Brtsh Columba, H.263 Reference Software Model: TMN8.