Video shot segmentation using late fusion technique

Similar documents
Recall precision graph

AIIA shot boundary detection at TRECVID 2006

A Robust Wipe Detection Algorithm

Searching Video Collections:Part I

Shot Detection using Pixel wise Difference with Adaptive Threshold and Color Histogram Method in Compressed and Uncompressed Video

Video Key-Frame Extraction using Entropy value as Global and Local Feature

A SHOT BOUNDARY DETECTION TECHNIQUE BASED ON LOCAL COLOR MOMENTS IN YC B C R COLOR SPACE

CHAPTER 3 SHOT DETECTION AND KEY FRAME EXTRACTION

5. Hampapur, A., Jain, R., and Weymouth, T., Digital Video Segmentation, Proc. ACM Multimedia 94, San Francisco, CA, October, 1994, pp

A Semi-Automatic 2D-to-3D Video Conversion with Adaptive Key-Frame Selection

An ICA based Approach for Complex Color Scene Text Binarization

Scene Change Detection Based on Twice Difference of Luminance Histograms

Shot segmentation and edit effects

Video De-interlacing with Scene Change Detection Based on 3D Wavelet Transform

Production of Video Images by Computer Controlled Cameras and Its Application to TV Conference System

MULTIVIEW REPRESENTATION OF 3D OBJECTS OF A SCENE USING VIDEO SEQUENCES

Automatic Shot Boundary Detection and Classification of Indoor and Outdoor Scenes

PixSO: A System for Video Shot Detection

An Approach for Reduction of Rain Streaks from a Single Image

Face Detection and Recognition in an Image Sequence using Eigenedginess

Image retrieval based on bag of images

Key Frame Extraction using Faber-Schauder Wavelet

Key Frame Extraction and Indexing for Multimedia Databases

Application of Characteristic Function Method in Target Detection

Performance Comparisons of Gradual Transition Detection Methods

Clustering Methods for Video Browsing and Annotation

Tamil Video Retrieval Based on Categorization in Cloud

AN EFFICIENT VIDEO WATERMARKING USING COLOR HISTOGRAM ANALYSIS AND BITPLANE IMAGE ARRAYS

Image Segmentation for Image Object Extraction

DATA and signal modeling for images and video sequences. Region-Based Representations of Image and Video: Segmentation Tools for Multimedia Services

Iterative Image Based Video Summarization by Node Segmentation

Multimedia Database Systems. Retrieval by Content

Elimination of Duplicate Videos in Video Sharing Sites

IMAGE COMPRESSION USING ANTI-FORENSICS METHOD

Hybrid Video Compression Using Selective Keyframe Identification and Patch-Based Super-Resolution

Performance Evaluation of Algorithm for Detection of Fades in Video Sequences in Presence of Motion and Illumination

Biometric Security System Using Palm print

Efficient Acquisition of Human Existence Priors from Motion Trajectories

Improving Latent Fingerprint Matching Performance by Orientation Field Estimation using Localized Dictionaries

A Novel Compression for Enormous Motion Data in Video Using Repeated Object Clips [ROC]

Analysis of Image and Video Using Color, Texture and Shape Features for Object Identification

IMPROVED SIDE MATCHING FOR MATCHED-TEXTURE CODING

Real-Time Content-Based Adaptive Streaming of Sports Videos

An Efficient Method for Detection of Wipes in Presence of Object and Camera Motion

Object Detection in a Fixed Frame

The ToCAI Description Scheme for Indexing and Retrieval of Multimedia Documents 1

Video Syntax Analysis

Fast Wavelet Histogram Techniques for Image Indexing

Spatial Outlier Detection

Camera Motion Identification in the Rough Indexing Paradigm

An Efficient Character Segmentation Based on VNP Algorithm

Robust color segmentation algorithms in illumination variation conditions

NOVEL APPROACH TO CONTENT-BASED VIDEO INDEXING AND RETRIEVAL BY USING A MEASURE OF STRUCTURAL SIMILARITY OF FRAMES. David Asatryan, Manuk Zakaryan

Schema Matching with Inter-Attribute Dependencies Using VF2 Approach

Automatic Video Caption Detection and Extraction in the DCT Compressed Domain

Satellite Image Processing Using Singular Value Decomposition and Discrete Wavelet Transform

Adaptive Gesture Recognition System Integrating Multiple Inputs

CS229: Action Recognition in Tennis

Video Partitioning by Temporal Slice Coherency

CONTENT BASED VIDEO RETRIEVAL SYSTEM

Haresh D. Chande #, Zankhana H. Shah *

Mobile Human Detection Systems based on Sliding Windows Approach-A Review

Moving Object Segmentation Method Based on Motion Information Classification by X-means and Spatial Region Segmentation

Implementation, Comparison and Literature Review of Spatio-temporal and Compressed domains Object detection. Gokul Krishna Srinivasan ABSTRACT:

A Miniature-Based Image Retrieval System

Detection of Documentary Scene Changes by Audio-Visual Fusion

Binarization of Color Character Strings in Scene Images Using K-means Clustering and Support Vector Machines

INF 4300 Classification III Anne Solberg The agenda today:

Video Processing for Judicial Applications

Segmentation and Tracking of Partial Planar Templates

Idle Object Detection in Video for Banking ATM Applications

Spatio-Temporal Stereo Disparity Integration

Video Summarization I. INTRODUCTION

Classification. Vladimir Curic. Centre for Image Analysis Swedish University of Agricultural Sciences Uppsala University

Image Classification Using Wavelet Coefficients in Low-pass Bands

AN important task of low level video analysis is to extract

Comparison of Sequence Matching Techniques for Video Copy Detection

An Adaptive Threshold LBP Algorithm for Face Recognition

Blood vessel tracking in retinal images

Including the Size of Regions in Image Segmentation by Region Based Graph

Fast Denoising for Moving Object Detection by An Extended Structural Fitness Algorithm

CHAPTER 6 MODIFIED FUZZY TECHNIQUES BASED IMAGE SEGMENTATION

SHADOW DETECTION USING TRICOLOR ATTENUATION MODEL ENHANCED WITH ADAPTIVE HISTOGRAM EQUALIZATION

Random spatial sampling and majority voting based image thresholding

CONTENT analysis of video is to find meaningful structures

Hybrid Biometric Person Authentication Using Face and Voice Features

TEVI: Text Extraction for Video Indexing

Writer Recognizer for Offline Text Based on SIFT

Repeating Segment Detection in Songs using Audio Fingerprint Matching

Sketch Based Image Retrieval Approach Using Gray Level Co-Occurrence Matrix

INTERNATIONAL JOURNAL OF PURE AND APPLIED RESEARCH IN ENGINEERING AND TECHNOLOGY

Connected Component Analysis and Change Detection for Images

Patch-Based Color Image Denoising using efficient Pixel-Wise Weighting Techniques

Image Contrast Enhancement in Wavelet Domain

Real-time Monitoring System for TV Commercials Using Video Features

WHO MISSED THE CLASS? - UNIFYING MULTI-FACE DETECTION, TRACKING AND RECOGNITION IN VIDEOS. Yunxiang Mao, Haohan Li, Zhaozheng Yin

Binju Bentex *1, Shandry K. K 2. PG Student, Department of Computer Science, College Of Engineering, Kidangoor, Kottayam, Kerala, India

Human Motion Detection and Tracking for Video Surveillance

Suspicious Activity Detection of Moving Object in Video Surveillance System

A Fast Method for Textual Annotation of Compressed Video

Transcription:

Video shot segmentation using late fusion technique by C. Krishna Mohan, N. Dhananjaya, B.Yegnanarayana in Proc. Seventh International Conference on Machine Learning and Applications, 2008, San Diego, California, USA, pp. 267-270, Dec. 11-13, 2008. Report No: IIIT/TR/2008/170 Centre for Language Technologies Research Centre International Institute of Information Technology Hyderabad - 500 032, INDIA December 2008

2008 Seventh International Conference on Machine Learning and Applications Video Shot Segmentation using Late Fusion Technique C. Krishna Mohan 1, N. Dhananjaya 2, and B. Yegnanarayana 3 1 National Institute of Technology Karnataka Suratkal Srinivasnagar - 575 025, Karnataka, India 2 Indian Institute of Technology Madras, Chennai - 600 036, India 3 Internation Institute of Information Technology, Hyderabad - 500 032, India email: chalavadi km@yahoo.com, dhanu@cs.iitm.ernet.in, yegna@iiit.ac.in Abstract In this paper, a new method for detecting shot boundaries in video sequences using a late fusion technique is proposed. The method uses color histogram as the feature, and processes each bin separately for detecting shot boundaries. The decisions from individual bins are combined later for hypothesizing the presence of shot boundaries. The method provides a certain degree of robustness against illumination and camera/object motion, as it ignores small changes in the bins. While the early fusion techniques rely on the extent of change in color information, the proposed technique relies on the number of significant changes. Experimental results successfully validate the new method and show that it can effectively detect both abrupt and gradual transitions. Keywords: video segmentation, shot boundary detection, early fusion, late fusion, video content analysis. 1. Introduction Content-based indexing and retrieval of digital video is an active research area. Video segmentation is the first preprocessing step to further analyze the video content for indexing and browsing. Video segmentation or shot boundary detection involves temporal segmentation of video sequences into elementary units, called shots. A shot in a video is a contiguous sequence of video frames recorded from a single camera operation, representing a continuous action in time and space [4]. Shot transitions can be abrupt (cuts) or gradual (fades, dissolves and wipes). Many techniques have been developed to detect the video shot boundaries. A good review and comparison of existing methods is presented in [6, 5]. Gradual transitions are generally more difficult to detect, as they can often be confused with camera/object motion. Detection of gradual transitions, such as fades and dissolves, is examined in [3, 7]. Color histogram is one of the most widely used feature for detecting the shot transitions [3, 8]. The feature vectors representing color information are generally large and sparse, prompting a reduction in dimensionality. Singular value decomposition [1] and independent component analysis [9] have been examined for this purpose. A wide variety of dissimilarity measures have been used in the literature [9, 1]. Some of the commonly used measures are: Euclidean distance, cosine dissimilarity, Mahalanobis distance and log-likelihood ratio. Information theoretic measures like mutual information and joint entropy between consecutive frames have also been proposed for detecting cuts and fades [2]. In this paper, we propose a method for shot boundary detection which involves late fusion of decisions obtained by processing individual bins separately. The early fusion technique computes the net changes in all the bins or dimensions, which can be significantly large although the change in the individual bins is small. This can lead to false hypotheses of shot boundaries thereby bringing down the performance. The late fusion technique provides robustness against such cases which are typically caused by illumination changes and camera/object motion. Experimental results indicate that the proposed method can effectively detect both abrupt transitions and gradual transitions. The remainder of this paper is organized as follows: In Section 2, we propose two modifications to early fusion algorithm. In Section 3, the proposed late fusion technique for shot boundary detection is described. Experimental results are discussed in Section 4. Section 5 gives conclusion of this work. 978-0-7695-3495-4/08 $25.00 2008 IEEE DOI 10.1109/ICMLA.2008.88 267

0.05 1 0.04 0.8 >P(X=x) 0.03 0.02 >P(X x) 0.6 0.4 0.01 0.2 0 0 50 100 150 200 250 > x (# bins) 0 0 50 100 150 200 250 > x (# bins) Figure 1. (a) Probability distribution of shot boundaries in terms of the number of color bins changing significantly. (b) Cumulative distribution of (a). 2 Shot Boundary Detection by Early Fusion Shot boundary detection involves testing, at every frame index n of a given video of length N v frames, the following two hypotheses: H 0 : A shot boundary exists at frame index n. H 1 : No shot boundary exists at frame index n. (1) Let X = {x 1, x 2,...,x n,...,x Nv } be the sequence of feature vectors of dimension p representing the N v frames in the video. Testing of the hypotheses at the frame index n involves computation of a dissimilarity value, d[n] = d(x L, X R ), between two sequences of N feature vectors X L = {x n 1, x n 2,...,x n N } and X R = {x n, x n+1,...,x n+n 1 } to the left and right of n, respectively. The value of N can vary from one frame to a few frames (corresponding to less than one or two secs). If the dissimilarity value is greater than a threshold τ[n], the hypothesis H 0, that a shot boundary exists, is chosen. In this work, Euclidean distance is used as the dissimilarity measure. An adaptive threshold is computed using the variance of a few frames before the frame at which the hypotheses is tested. Let d[n] =d euc (µ L, µ R ) denote the Euclidean distance between the means of X L and X R. If σ L [n] = Σ L represents the amount of variability within a block of N frames to the left of n, then the dynamic threshold is computed as τ L [n] =β σ L [n], where Σ L is the covariance matrix of X L,andβ is a scaling parameter. The first set of shot boundaries are hypothesized using the dynamic threshold based technique described above with a window size of N =1. In order to reduce the number of misses, the video is also processed in the reverse (or backward) direction, which is equivalent to comparing the dissimilarity value with τ R [n], the amount of variability to the right of n. The condition for hypothesizing a shot boundary thus becomes d euc (µ L, µ R ) >τ L [n] d euc (µ L, µ R ) >τ R [n] (2) The use of OR ( ) logic in the bidirectional processing of the video increases the number of false hypotheses, which are reduced by validating the hypothesized boundaries using the same condition as in Eq. 2, but with a larger window size, say N =10. 3 Shot Boundary Detection by Late Fusion The early fusion technique described in the previous section was based on the overall change in color histogram between adjacent frames in a video sequence. The dimension of color histogram(512 in this case) was chosen to provide adequate representation to each color component. However, not all components of the color histogram feature vectors are populated for a given frame of video. Secondly, not all components of the color histogram change significantly in the neighbourhood of a shot boundary. It is observed that in general, a small number of color bins undergo a significant change when there is shot boundary. Figs. 1 (a) and (b) show the probability and cumulative distributions of the number of bins changing significantly at the actual shot boundaries, for a threshold factor of β =5. It can be seen from Fig. 1 (b) that around 50% of the shot boundaries have 50 or less number of bins changing significantly (approximately 10% of the total number of bins) and around 82% of the shot boundaries have 100 or 268

Table 1. Video data used for shot boundary detection experiments. Clip ID Duration (min) #cuts # graduals BBC-1 22 125 18 BBC-2 23 154 29 CNN-1 24 76 72 CNN-2 24 43 47 NDTV-1 27 135 7 Overall 120 533 173 less number of bins (approximately 20%) changing significantly. Hence we see that a shot boundary can be detected by observing a significant change in a small number of bins. At the same time, if all components of the color histogram are considered for the computation of dissimilarity, as is the case in early fusion, even a small contribution from each component results in a large value of the overall dissimilarity. This is typically the case when frames in a video sequence change gradually due to object/camera motion and intensity variations, even when there is no shot boundary. To overcome the problem of false hypothesis due to small changes accumulated over a large number of bins, we propose to use the number of bins changing significantly as a measure to hypothesize a shot boundary. We call this as late fusion technique, since the components of color histogram are first observed for significant change, and only then included in the process of decision making. The condition for hypothesizing a shot boundary is exactly same as the early fusion technique outlined in the previous section, except that it is applied on individual bins separately. If the number of bins voting for a shot boundary exceeds a threshold M =20, a shot boundary is hypothesized. The optimal threshold factor M opt corresponds to best F 1 measure. Our observation from the experiments is that less than 20 components of the color histogram are sufficient for detection of shot boundaries, provided that these components change significantly in the vicinity of a shot boundary. 4 Experimental Results The performance of the shot boundary detection is evaluated on a database of approximately 2 hours of news video. For each video sequence, a human observer has determined the precise location and duration of the edits to be used as ground truth. The database contains a total of 533 cuts and 173 gradual transitions, the details of which are shown in Table 1. The video clips were captured at a rate of 25 frames per second, at 320 240 pixel resolution, and stored in AVI format. A 512-dimension RGB color histogram, obtainedbyquantizingthe3-dcolorspaceintoa 8x8x8 grid, is used as the feature vector. The performance of the shot boundary detection task is measured in terms of recall R = N c /N m and precision P = N c /(N c +N f ),wheren m is the total number of actual (or manually marked) shot boundaries, N c is the number of shot boundaries detected correctly, and N f is the number of false alarms. A good performance requires both the recall and the precision to be high. The choice of the threshold factor β is crucial. A small value of β improves the recall, while at the same time reducing the precision. A large value of β has the reverse effect on the recall and precision. A compromise between recall and precision is obtained by using a measure combining the recall and precision, given by F 1 =2RP/(R+P ). The performance of shot boundary detection using forward and backward processing of video by early fusion of evidence is given in Table 2. Performance after combining the evidence obtained using forward and backward processing is given in Table 3. The optimal threshold factor β opt corresponds to best F 1 measure. It is to be noted here that the optimal threshold factor is different for different clips, and also for forward and backward directions of the same clip. It can also be seen that the OR logic improves the performance, while the AND logic reduces the optimal F 1 value. This is mainly due to a high probability that one of the two sides of a shot boundary has a large variance among the feature vectors. The performance of shot boundary detection by late fusion of decisions along individual dimensions is given Table 4. The comparison of Tables 3 and 4 indicates that the performance of late fusion algorithm is comparable to that of early fusion (when OR logic is used for combination). 5 Conclusions In this paper, we have presented a new method for video shot boundary detection based on the late fusion of evidences. The early fusion technique computes the net changes in all the bins or dimensions, which can be significantly large although the change in the individual bins is small. This can lead to false hypotheses of shot boundaries thereby bringing down the performance. The late fusion technique provides robustness against such cases which are typically caused by illumination changes and camera/object motion. The simulations show that the method achieved good performance for detecting both abrupt transitions and gradual transitions. To further improve the performance of the shot boundary detection, we can combine the evidence due to late fusion with the evidence due to early fusion to exploit the advantages of both the methods which will be our future work. 269

Table 2. Performance of shot boundary detection using forward and backward processing of video by early fusion of evidence. Forward Processing Backward Processing Clip ID β opt R P F 1 β opt R P F 1 BBC-1 10.0 0.881 0.926 0.903 8.0 0.902 0.942 0.921 BBC-2 9.5 0.852 0.912 0.881 6.5 0.858 0.844 0.851 CNN-1 8.0 0.878 0.909 0.893 6.5 0.892 0.904 0.898 CNN-2 11.0 0.867 0.897 0.881 9.0 0.844 0.854 0.849 NDTV-1 12.5 0.768 0.908 0.832 6.5 0.873 0.780 0.824 Overall 9.5 0.854 0.889 0.871 6.5 0.890 0.820 0.853 Table 3. Performance of shot boundary detection by combining the evidence obtained using forward and backward processing of video. Combined (OR) Combined (AND) Clip ID β opt R P F 1 β opt R P F 1 BBC-1 10.0 0.937 0.918 0.927 3.0 0.881 0.592 0.708 BBC-2 9.5 0.902 0.878 0.889 3.0 0.863 0.583 0.696 CNN-1 8.0 0.953 0.898 0.925 3.0 0.736 0.407 0.524 CNN-2 13.0 0.878 0.940 0.908 3.0 0.700 0.240 0.358 NDTV-1 9.5 0.901 0.837 0.868 3.0 0.852 0.531 0.654 Overall 10.0 0.908 0.882 0.895 3.0 0.806 0.470 0.588 Table 4. Performance of shot boundary detection by late fusion of decisions along individual dimensions. Clip ID M opt R P F 1 BBC-1 19.000 0.930 0.911 0.920 BBC-2 17.000 0.880 0.880 0.880 CNN-1 12.000 0.865 0.914 0.889 CNN-2 12.000 0.900 0.976 0.936 NDTV-1 17.000 0.930 0.841 0.883 Overall 15.000 0.888 0.877 0.882 References [1] Z. Cernekova, C. Kotropoulos, and I. Pitas. Video shot segmentation using sigular value decomposition. In Proc. Int. Conf. Acoustics, Speech and Signal Processing, Hong Kong, pp. 181-184, Apr. 6-10, 2003. [2] Z. Cernekova, C. Nikou, and I. Pitas. Shot detection in video sequences using entropy-based metrics. In Proc. IEEE Int. Conf. Image Processing, pp. 421-424, 2002. [3] M. Drew, Z.-N. Li, and X. Zhong. Video dissolve and wipe detection via spatio-temporal images of chromatic histogram diffrences. In Proc. IEEE Int. Conf. Image Processing, pp. 929-932, 2000. [4] U. Gargi, R. Kasturi, and S. H. Strayer. Performance characterization of video-shot-change detection methods. IEEE Trans. Circuits, Systems, Video Technology, 10(1):1 13, Feb. 2000. [5] A. Hanjalic. Shot-boundary detection: unraveled and resolved. IEEE Trans. Circuits, Systems, Video Technology, 12(2):90 104, Feb. 2002. [6] R. Lienhart. Reliable transition detection in videos: A survey and practitioners guide. In Int. Journal of Image and Graphics, pp. 469-486, 2001. [7] B. T. Truong, C. Dorai, and S. Venkatesh. New enhancements to cut, fade and dissolve detection processes in video segmentation. In ACM Int. Conf. on Multimedia, pp. 219-227, Nov. 2000. [8] S. Tsekeridou and I. Pitas. Content-based video parsing and indexing based on audio-visual interaction. IEEE Trans. Circuits, Systems, Video Technology, 11(4):522 535, 2001. [9] J. Zhou and X.-P. Zhang. Video shot boundary detection using independent component analysis. In Proc. Int. Conf. Acoustics, Speech and Signal Processing, Philadelphia, USA, pp. 541-544, Mar. 18-23, 2005. 270