Video Summarization Using MPEG-7 Motion Activity and Audio Descriptors

Size: px

Start display at page:

Download "Video Summarization Using MPEG-7 Motion Activity and Audio Descriptors"

Claude Martin
5 years ago
Views:

1 Video Summarization Using MPEG-7 Motion Activity and Audio Descriptors Ajay Divakaran, Kadir A. Peker, Regunathan Radhakrishnan, Ziyou Xiong and Romain Cabasson Presented by Giulia Fanti 1

2 Overview Motivation Overview of MPEG compression Keyframe extraction Use motion information from MPEG-7 compression Extract better keyframes Constant pace video skimming Effectively summarize varying action levels Audio assisted news video browsing Sports highlights detection Golf Soccer 2

3 Motivation Amount of video available is enormous Want to search/navigate easily Emphasis on computational efficiency, Incorporation into consumer system hardware 3

4 How was video previously summarized? Use keyframes-- first frame of a shot Colouras similarity metric Proposition: Use MPEG-7 intensity of motion activity descriptors! Keyframedescribes whole shot 4

5 MPEG OVERVIEW How does MPEG Work? 5

6 MPEG OVERVIEW MPEG (cont d) 6

7 MPEG OVERVIEW MPEG (cont d) 7

8 MPEG OVERVIEW MPEG (cont d) Frame N Frame N+1 8

9 KEYFRAME IDENTIFICATION Specify notion of fidelity d i = min(d(s j,r i )), j = 1,m Semi-Hausdorff distance d sh (S,R) = max(d i ), i = 1, n Keyframequality S = keyframes R = regular frames Key frames Remaining Frames 9

10 KEYFRAME IDENTIFICATION KeyframeExtraction Ideas (1) Baseline technique for comparison For each set of n keyframes: For each shot, compute the dsh Choose the keyframe set with the lowest semihausdorff distance Computationally very expensive, but optimal 10

11 KEYFRAME IDENTIFICATION KeyframeExtraction Ideas (2) Extract n keyframes Examine cumulative motion function Divide range of motion function into n intervals, choose the frame corresponding to the middle of each interval Cumulative Motion Intensity from First Frame 11

12 KEYFRAME IDENTIFICATION Similarity Single keyframeworks for over 90% of low-motion frames 12

first and last frames as keyframes At each

13 KEYFRAME IDENTIFICATION KeyframeExtraction Ideas (3) Progressively increase resolution Take the first and last frames as keyframes At each iteration add a keyframein the middle of motion spectrum 13

14 Constant Pace Skimming using Motion Activity Two interpretations Speed up/slow down playback Change the sampling rate Uniform summarization: fast playback Instead fix the motion activity, and adjust video accordingly Set minimum action threshold for entire playback 14

15 CONSTANT PACE SKIMMING Testing Surveillance data colour data is useless! No quantitative explanation of results Uniform Sampling Motion Compensation 15

16 CONSTANT PACE SKIMMING Testing (commercial video) Observation: Sometimes you can get semantic information from the motion vectors! Golf News Soccer Basketball 16

17 Audio-Assisted News Browsing News is structured Large-scale semantic segmentation Individual shots make up larger segment Goal: extract query segment and generate summary Old approach: Speech/non-speech detection Train GMMs for each speaker Fit each new speech segment to the GMMs Computationally complex 17

18 AUDIO-ASSISTED NEWS BROWSING Instead Try the sound-recognition framework by Casey et al Offline train HMMs for various sounds Online run Viterbi algorithm on HMMs Computationally cheap! Given sound clip, return histogram w/ state frequency Dog barks Man talks Lady talks Glass Break 18

19 Principal Cast Identification: Structure Feature Extraction - MPEG-7 Extract intensity motion of motion activity from P frames activity, colour, - 64 bin colourhistogram from I-frames audio features from - Audio energy news bands projected onto HMM class bases ID Speaker Changes - Classification as male, female, speech with music - Clustering determines break points Use sound recognition framework to ID speaker changes Merge motion/speaker clusters to ID principal speakers/ segment Apply the browsing to each segment 19

20 PRINCIPAL CAST IDENTIFICATION Feature Extraction 3 s 6s 9s 12 s -Sum energy bands -Project onto sound class principal components 20

21 Casey, M., MPEG-7 Sound Recognition Tools, IEEE Transactions on Circuits and Systems for Video Technology, June Dog barks Man speaks Lady speaks Glass breaks Feature Vector: [ ] 21

22 PRINCIPAL CAST IDENTIFICATION ID Speaker Changes Observation: The HMM framework is just like GMMs. Each state in HMM is like a cluster in feature space! Use KL divergence as metric 22

23 PRINCIPAL CAST IDENTIFICATION Clustering Contiguous set of female speech segments Build a dendrogram by merging clusters to ID speakers 23

24 AUDIO-ASSISTED NEWS BROWSING Testing 3.5 hours of news video for training data 4 different TV channels Training data partitioned 90%-10% training-validation Actual testing One 34-minute news segment, one 59-minute segment 24

25 Results 25

26 Sports Highlight Detection Motion vectors are noisy in sports footage Use temporal motion patterns to detect highlights Use structure of game to help Combine visual and audio features Focus on detection of interesting parts 26

27 SPORTS HIGHLIGHTS DETECTION Golf Smooth out motion vectors Look for long stretches of low activity followed by high activity Stitch together 10 second segments starting at the spike in activity Misses putts (slow camera motion) 27

28 SPORTS HIGHLIGHTS DETECTION Soccer Locate all audio volume peaks For each peak, check if play stopped, stayed stopped Concatenate periods immediately preceding stopped play Testing: 7 soccer games: Korea, USA, Europe 28

29 SPORTS HIGHLIGHTS DETECTION Unified framework Impractical to make separate algorithm for every sport Wish to combine highlight detection for general sport Consider soccer, golf, and baseball 29

30 SPORTS HIGHLIGHTS DETECTION Technique Interesting events usually marked by applause Want to classify: Applause, cheering, ball hits, music, speech, speech with music Train HMMs for each class Use Mel Frequency Cepstrum Coefficients 30

31 SPORTS HIGHLIGHTS DETECTION Technique Collect all sequences of uninterrupted cheering Keep all sequences that last >67% of the longest cheering segment Add time cushion to start and end of cheering Use length of cheering as indicator of importance Amplitude X Time 31

32 SPORTS HIGHLIGHTS DETECTION Highlight Extraction Framework 32

33 Classification results 33

34 Future work Better clustering algorithms Multi-level pruning on dendrograms More sophisticated associations between audio and visual features Assessment of semantic success of summarization Improve audio-based video browsing More robust Use the semantic info from audio classification more Content-adaptive techniques that learn variations in content 34

35 Conclusions Totally heuristic approach Seems to work for their needs Summarization works best within a semantic segment Use MPEG-7 generalized sound recognition to ID semantic units Use domain knowledge to ID regions of high/low motion activity 35

36 Questions? 36

Generation of Sports Highlights Using a Combination of Supervised & Unsupervised Learning in Audio Domain

MITSUBISHI ELECTRIC RESEARCH LABORATORIES http://www.merl.com Generation of Sports Highlights Using a Combination of Supervised & Unsupervised Learning in Audio Domain Radhakrishan, R.; Xiong, Z.; Divakaran,