A Rapid Scheme for Slow-Motion Replay Segment Detection Wei-Hong Chuang, Dun-Yu Hsiao, Soo-Chang Pei, and Homer Chen Department of Electrical Engineering, National Taiwan University, Taipei, Taiwan 10617, R.O.C., {r92942040, r92942028}@ntu.edu.tw, {pei, homer}@cc.ee.ntu.edu.tw Abstract. Efficient data mining for digital video has become increasingly important in recent years. In this paper, we present a new scheme for automatic detection of slow-motion replays in sports video. Several slow-motion features and some newly discovered characteristics of slowmotion segments are exploited to aid the detection. The first step of our method is based on the macroblock motion vector information, while the second step makes use of frame-to-frame difference under an MC-DCT structure to verify the output of the first step. The last step is applied to refine the segment boundaries. Unlike previous approaches, our method has great improvement in both speed and accuracy and a balance between efficiency and simplicity. 1 Introduction As recent advances in digital video coding and transmission have made digital video very popular, It becomes more and more difficult for end users to go through all received video information. Sports video programs are a good example. Someone may be interested in many games played in a day but does not have the time to watch all of them or even one of them throughout. A tool for automatic detection of important events in sports video and for summary presentation will thus be very useful. Many approaches to event detection have been proposed [2], [6], [7], [10]. Besides these existing ideas, slow-motion replays usually represent the occurrence of important events, too. If these replay segments could be detected effectively, we can extract them from the original video and present them as a meaningful kind of highlights. In this paper, we propose a new scheme for slow-motion replay segments detection (SMRSD). First, we apply a novel procedure by scanning macroblock reference directions to find slow-motion replay segment candidates. Next, a DCT-domain procedure is used to refine the candidates and discard false ones. This method also checks frame differences but avoids inverse DCT operations. Lastly, the resulting replay segments are concatenated and presented to the viewer. This detection scheme performs completely in the compressed domain. With this scheme, the time cost will be low and results will be reliable. K. Aizawa, Y. Nakamura, and S. Satoh (Eds.): PCM 2004, LNCS 3331, pp. 239 246, 2004. c Springer-Verlag Berlin Heidelberg 2004
240 W.-H. Chuang et al. 2 Previous Work The SMRSD issue has been addressed in the literature by some researchers [1], [4], [10]. In general, there are two ways to generate slow-motion effects. For videos captured by a standard-speed camera, the slow-motion effect can be generated by frame repetition or frame interpolation [2]. (Note: The frame repetition method is more widely adopted because of its simplicity and maturity.) For videos recorded with a high-speed camera, the slow-motion effect can be generated by simply playing out the video at the normal speed. Since standard-speed cameras are much more widely used (and cheaper) than high-speed cameras [1], in this paper, we concentrate on slow-motion videos generated by frame repetition. Pan et.al [1] proposed an approach to detection of slow-motion segments. They defined the frame difference as D(n) = M p=1 q=1 N (I n (p, q) I n 1 (p, q)) 2, where I n (p, q) denotes a macroblock in frame n. In a slow-motion region, because of the existence of repeated frames, D(n) would have more abrupt changes than what it has in the normal play region. An example is shown in Fig.1, where it can be seen that D(n) exhibits a clear pattern in which a large value is followed by several near zero values. That is, D(n) will cross its mean value in a slow-motion segment more often than in a standard-speed segment. In other words, D(n) has a higher zerocrossing rate in slow-motion segments. Pan [1] defined the zero-crossing as L 1 Z c (n, θ k )= trld(d(n i) D(n),D(n i 1) D(n),θ k ) i=1 where trld(x, y, t) = { 1ifx t and y t 0ifx t and y t. Accordingly, p zc (n) is defined as a measure of the fluctuation of D(n): p zc (n) = arg max Z c (n, θ k ) β. k As suggested in [2], threshold values are set to: L =7,θ k = k, β =1 The reader is referred to [1], [2],and [4] for more details.
A Rapid Scheme for Slow-Motion Replay Segment Detection 241 (a) (b) (c) (d) Fig. 1. (a) D(n) of a video segment consisting of a normal play region and a slowmotion region, (b) the zoomed-in view of the normal play region, and (c) the zoomed-in view of the slow-motion region. (d) is a slow-motion segment which has a portion of weak zero-crossing pattern. 3 New Discovery and Compressed Domain SMRSD Our tests told us that the zero crossing is a prevalent characteristic of slowmotion segments. But we have also noted that some slow-motion segments have weak zero-crossing patterns (as displayed in Fig. 1(d)). In such cases, purely rely on the zero-crossing may not give satisfactory results. Based on the observation of many sports videos, we have found that slow-motion segments would likely begin and end with a scene change such as logo flashing and scene wiping. These scene changes work as a clue of the appearance of slow-motion segments. If the segment boundaries are detected first, one may improve the computational efficiency by localizing the zero-crossing operations to only a few frames within each segment. Pan et al. [1] used an HMM model to detect the segment boundaries. This method requires training an HMM model. However, different kinds of video may have different characteristics. One HMM model is not general enough to work for all different kinds of videos. The method developed by Pei and Chou in [5] detects the scene change based on the reference direction of each macroblock of bi-directional prediction (B) frames. The B frames before and after a scene change tend to have different reference directions, as shown in Fig. 2. This method performs consistently well for typical videos such as MTV, TV commercials, etc, but it generates incorrect
242 W.-H. Chuang et al. Fig. 2. Scene change detection based on the reference direction of bi-directional prediction (B) macroblocks. P stands for prediction macroblock, and I for intra macroblock. results for sports video where there is a significant amount of motions due to camera panning or zooming. While the detection method described in [5] fails for the specific application targeted in this paper, we found that it exhibits a distinctively different behavior for slow-motion segments: The number of scene changes it detects is far more than that for the normal play regions. This discovery inspires us to develop the new SMRSD scheme discussed here. The false alarms for scene changes generated by the detection method [5] signals the existence of slow-motion replay segments. That is, the more scene changes it detects, and the closer between scene changes, the more likely the segment is a slow-motion segment. This is illustrated in Fig. 3. After further examining the phenomena described above, we found that slowmotion segments usually exhibit a particular pattern. The frame will remain still for some time then change to the next one. Freeze and play is ubiquitous in slow-motion plays, thus motion vectors of the macroblocks are likely to refer to different directions (most macroblocks of one frame refer forward, and most macroblocks of the adjacent frame refer backward) when abrupt shift between frames takes place. Hence it makes sense to detect slow-motion segments in a video by detecting the great change in reference direction of macroblocks. 3.1 Step I. Slow-Motion Segment Detection The purpose of this step is to obtain an initial segmentation of the video sequence. High-density occurrence in Fig. 3 is an implication of the existence of a slow-motion region. Thus we extract those high density segments of the detection of large difference in reference direction change from the input video sequence. This includes an analysis of the spacing between occurrences generated by the detection method [5]. A threshold is applied to select slow-motion segments from all possible candidates. To obtain better detection results, we have many voting rules to account the amount of change in reference direction, then we discarded
A Rapid Scheme for Slow-Motion Replay Segment Detection 243 Fig. 3. The occurrence of scene changes detected by the method described in [5]. In those detection regions with high density, we found that there are great possibility that slow-motion segments reside. Fig. 4. A block diagram of the new SMRSD scheme. It consists of three steps as described in the following. or added voting rules regarding to their contribution to detection results and emphasis on the most effective one, which is the reference pattern shown in Fig. 2. This greatly enhances the performance of the method and made much difference than any other kinds of slow-motion detection methods. After such a refinement, it is powerful in detecting jerky changes in video sequences, which is the phenomenon that exhibits in slow-motion segments. In this step, neighboring slow-motion segments are merged to account for the fact that, in practice, scene change can happen in a slow-motion replay. Slow-motion segments that are very short (for example, less than 1 second) are discarded. The candidates generated by this step usually have an accuracy of 70% in detection including miss and false alarm rate. To improve this result, we use the following step to reduce error rate and still preserve a high speed in detection.
244 W.-H. Chuang et al. 3.2 Step II. Refinement This refinement step, applied to those candidates found in Step I, is based on the zero-crossing measure [1]. There are two points to be noted. First, our goal is to develop a technique in the compressed domain; however, the zero-crossing method is applied to video frames in the spatial domain. We know that DCT is a unitary transform, and mathematically it makes no difference to calculate frame difference in either spatial domain or transform domain. The other point is that, in an MPEG stream, the DCT coefficients of predicted frames are the transform values of prediction error. Hence, we need to modify the calculation of D(n) in Section 2. Chang et al. [8] suggest using a decoding structure as shown in Fig. 5. This decoder structure is equivalent to the original decoder in that it generates the same decoding results. Under this decoding structure, the motion compensation is performed before the inverse DCT. This way, D(n) and the zero-crossing method can be performed in the compressed domain. This step consists of the following operations that are applied to each candidate segment found in Step I to refine the segment boundary: Find all frames of the segment such that p zc is greater than a preset threshold. For each frame near the segment boundary (both inside and outside the segment), make a merge or discard decision based on its p zc value. Merge two segments if they are close (for example, within 30 frames) to each others. 3.3 Step III. Summarization This step involves the following operations: 1) Choose a key frame from each slow-motion replay segment [7], [9]. 2) Include extra thirty seconds before and after each segment. 4 Experiment Results Our scheme is fast because it uses only motion vector and coefficient information and does not need any decompression operation. This section presents the experimental results of our SMRSD scheme. To see the validity of this scheme, we use three different video clips generated by three different producers in two different countries, as shown in Table 1. The total duration is more than half an hour. Table 1. SMRSD Test Data Sequences Length in min:sec Length in frames 1016 13:14 23841 1106 11:08 20047 Final 08:09 14659
A Rapid Scheme for Slow-Motion Replay Segment Detection 245 Fig. 5. SMRSD System Interface. Table 2. SMRSD Results Sequences # of replay segments Correct False Miss Inaccuracy boundary Time saved 1016 9 9 0 0 1 94.58% 1106 10 9 0 1 0 94.16% Final 1 1 0 0 0 98.57% We also developed a user interface which aids the user browsing the extraction SMRSD results, as follows: The SMRSD results are shown in Table 2. Our results are and visually pleasant. Long segments can be represented effectively by very short ones. Particularly, because the operations are completely in the compressed domain and only use motion vectors most of time, the scheme is very suited for huge amount of data. 5 Conclusions A new scheme for slow-motion replay segment detection has been described in this paper. We have tested this system in a computer environment that is similar to that in ordinary families, and the system (without optimization) runs far faster than real time. Experimental results show that this scheme is effective and superior to other schemes in that it allows a good balance between accuracy and efficiency. Users don t have to take a long time to watch the entire video of races or games, instead, they could watch the generated summaries to save their time.
246 W.-H. Chuang et al. Acknowledgement. We thank for those who gave us comments. These advices make this paper more complete and we are very grateful. References 1. H. Pan, P. van Beek, and M. I. Sezan, Detection of slow-motion replay segments in sports video for highlights generation, in Proc. ICASSP 01, pp. 1649 1652, 2001. 2. A. Ekin, M. Tekalp, and R. Mehrotra, Automatic soccer video analysis and summarization, IEEE Trans. Image Proc., Vol 12, No 7, pp.796 807, 2003. 3. J. Song and B.-L. Yeo, A fast algorithm for DCT-domain inverse motion compensation based on shared information in a macroblock, IEEE Trans. CSVT, Vol 10, No 5, pp.767 775, 2000. 4. H. Pan, B. Li, and M. I. Sezan, Automatic detection of replay segments in broadcast sports programs by detection of logos in scene transitions, in Proc. ICASSP 02, pp.3385 3388, 2002. 5. S.-C. Pei and Y.-Z.Chou, Efficient MPEG compressed video analysis using macroblock type information, IEEE Trans. Multimedia, Vol 1, No 4, pp.321 333, 1999. 6. Y. Rui, A. Gupta, and A. Acero, Automatically extracting highlights for TV baseball programs, in Proc. of 8th ACM Inter. Conf. on Multimedia, pp.105 115, 2000. 7. Y.-F. Ma, L. Lu, H.-J. Zhang, and M.J. Li, A User Attention Model for Video Summarization, in Proc. of 10th ACM Inter. Conf. on Multimedia, pp.533 542, 2002. 8. S.-F. Chang, and D.G. Messerschmitt, Manipulation and compositing of MC-DCT compressed video, IEEE J. Select. Areas Commun., Vol 13, pp.1 11, Jan.1995. 9. T. Liu, H.-J Zhang, and F.Qi, A novel video key-frame-extraction algorithm based on perceived motion energy model, IEEE Trans. CSVT, Vol 13, No 10, pp.1006 1013, 2003. 10. V. Kobla, D. DeMenthon, and D. Doermann, Detection of slow-motion replay sequences for identifying sports videos, in Proc. IEEE Third Workshop on Multimedia Signal Processing, pp. 35 140, 1999.