Searching Video Collections:Part I Introduction to Multimedia Information Retrieval Multimedia Representation Visual Features (Still Images and Image Sequences) Color Texture Shape Edges Objects, Motion Multimedia Indexing Video Segmentation Shot-Boundary Detection Effects Detection Beyond Basic Visual Features: Text, Face 1
Video Indexing Analysis of Still Image Features: Color, Texture, Shape Distance Metrics Analysis of Image Sequence Segmentation Cut Detection Motion Vectors Shot Transitions Camera Operations Scene Analysis Selection of Keyframes Shot Similarity video scenes shots frames 2
Camera Motion Descriptors Camera track, boom, and dolly motion modes, Camera pan, tilt and roll motion modes. 3
Video Indexing Multilayered Hierarchical Structure of a Video Clip Copyright by J. Hunter 2001, Dublin Core and MPEG-7 Metadata for Video 4
Video Indexing Semantic Units (Hierarchy) Object, Regions, Frames Shot: continuous sequence of frames captured from one camera Scene: one or more shots presenting different views of the same event (time or space related) Segment: one or more related scenes Transitions Cut - an abrupt shot change that occurs in a single frame Dissolves continuous transition, progressive linear combination Fade - a slow change in brightness usually resulting in or starting with a solid black frame Wipes pixels from the second shot replace those of the first shot in a regular pattern Others special effects, editing tools can offer up to 200 effects 5
Video Indexing Example Shots Scenes Description Formats Description Formats Text Text Text Text Camera Distance Controlled Vocabulary Script Text Camera Angle Controlled Vocabulary Transcript Text Camera Motion Controlled Vocabulary Edit List Text Duration secs, frames Duration secs, frames Start Time secs, frame #, SMPTE Start Time secs, frame #, SMPTE End Time secs, frame #, SMPTE End Time secs, frame #, SMPTE KeyFrame GIF, JPEG KeyFrame GIF, JPEG Lighting Controlled Vocabulary Locale Text Open Trans Controlled Vocabulary Cast Text Close Trans Controlled Vocabulary Object Text Dublin Core Metadata 6
Reliable Shot Detection The three most commonly used transition types are: Abrupt Cut, Hard Cuts Fades Dissolves 7
Cut Detection Time Cut: Sudden Change of Image Content between continuous shots Cut Detection: Separate Video into Shots and calculate Features for Shots separately. 8
Shot Transitions Fade In change of image content from monochrome color to image Fade Out example: fade from white/black change of image content from image to monochrome color example: fade to white/black Time 9
What is Dissolve? Dissolve: Shot Transition with Image Overlays Time 10
Types of Dissolve Cross dissolve Additive dissolve 11
Shot Boundary Detection Pixel Differences Statistical Differences Histograms Compression Differences Edge Tracking Motion Vectors SMPTE 00:12:45:20 12
Pixel Differences: Basic Idea Compute total number of pixels that change in value more than a threshold t If this total is greater than a second T b threshold then a shot boundary is detected Drawbacks Sensitive to camera motion (pan, zoom) Sensitive to object motion 13
Pixel Differences: Improvements Basic method plus the use of a 3x3 averaging filter before the comparison [Zhang93] Divide image in 12 regions and find the best match for each region in a neighborhood around the region in the other image. Difference is the sum of the region differences. [Shahraray95] Chromatic images: Change in gray level in 2 nd image Relatively constant for dissolves and fades Still sensitive to camera and object motion 14
Histogram Differences Use color/gray-scale histograms of pixels as a feature to detect shot boundaries Assumption: for the same background and same objects, there is very little change in the histogram th Let H ( j) be the histogram for the j bin of the th i frame, then difference is given by i CHD i j i+ Hi ( j) H 1( j) If the difference exceeds a threshold A shot boundary is detected = CHD i > T b 15
Histograms: Example Cut 16
Histograms: Difference Graph Cuts Threshold 17
Histogram-Based Cut Detection Different images can have same histograms Same Histogram Obvious example Not so obvious example Same Histogram 18
Histogram-Based Cut Detection: Challenges Different images can have similar histograms Color values of subsequent images change significantly without a cut occurring explosions change of scene illumination fast movement of large objects Performance of histogram-based cut detection between 90 and even 98 (in some cases) 19
Histogram Differences: improvements A coarse quantization is good enough. Typically, 6-bit code: 2 higher order bits or R, G and B channels. This leads to 64-bin histograms. Good trade-off between accuracy and speed for shot boundary detection Threshold selection is crucial. Threshold T b depends very much on the content Gradual transitions: use two thresholds instead of one global threshold, one for abrupt cuts and one for special effects 20
Histogram Comparison 405 459 810 Talk Show Sequence Copyright Philips (MPEG-7 contribution) 0.4264 0.4298 Frame Number 810 972 1026 Similarity Measure 0.1602 0.0383 21
Histograms Differences: Twin-Comparison Method CHD i Compute for all frames in video Mark camera breaks where CHD i > T b Mark potential gradual transitions subsequences GT = {[ F, F ]} wherever CHD i > T s e s For each gradual transitions [ F, F ],accumulate s e frame-to-frame difference: If AC > T b, then declare [ F, F ] s e as a gradual transition This algorithm works well and is widely used 22
IBM s CueVideo Shot Boundary Detection SMPTE 00:12:45:20 Detects cuts, dissolves, fades and other gradual changes Compare multiple pairs of frames: 1, 3 and 7 frames apart Processes decoded frames Supports MPEG, QT, AVI, live feed, No user-tuned parameters - allows batch processing Detection of flashes, bad frames One pass - allows live video processing Copyright IBM Almaden 23
CueVideo Histogram Example: 24
Edge Change Ratio (ECR) Properties edge pixel in image i and (i-1): s i and s i-1 Eout: pixel in image (i-1) is edge pixel, pixel in image i is not an edge pixel E in : pixel in image (i-1) is not an edge pixel, pixel in image i is edge pixel use of broad edges (noise independence) edge change ratio between images i and (i-1) Ein ECR = i 1 max, si 1 E s out i 25
Computation of ECR: Example AND Image (i-1) Edge Image (i-1) EC out i-1 ECR AND Image i Edge Image i Inverted Images EC ECR-Images i in 26
ECR Cut Detection D D D Time Time Inside Shot Cut Fade Out Time D D Fade In Time Dissolve Time 27
ECR Cut Detection: Cuts if ECR i is edge change ratio between frames i and (i-1) a cut is detected if where T is a threshold ECR i T Fast object and camera motion leads to high ECR-values without cuts Cuts 28
ECR Cut Detection Fade In, Fade Out Fade out: number of edge pixels zero after last frame of sequence Fade in: number of edge pixels zero before first frame of sequence Fade In Fade Out 29
ECR Cut Detection: Problems Fast object or camera motion Explosions Fades and dissolves soft transitions are difficult to detect other effects: wipe detection unreliable Performance typically between 90 and 95 percent 30
Shot-Boundary Detection: Conclusions Histogram-based technique are good to recognize cuts Standard deviation techniques good to recognize fades Dissolves are the more challenging Problems Ground truth: experimental data must be analyzed manually Database? Benchmarks? Definition of a fade/dissolve 31
Searching Video Collections:Part I Introduction to Multimedia Information Retrieval Multimedia Representation Visual Features (Still Images and Image Sequences) Color Texture Shape Edges Objects, Motion Multimedia Indexing Video Segmentation Shot-Boundary Detection Effects Detection Beyond Basic Visual Features: Text, Face 32
Text Detection: Applications Annotation and search of image and video libraries TV, movie studios, advertising, and surveillance Automatic identification and logging of the beginning and end of key events based on captions Video Summarization Ticker Tape analysis Commercial Detection Sports Programs indexing 33
Text Detection: Design Decisions What kind of text occurrences? Scene text Overlay text With what style attributes? Font size Font type Text color any In what kind of media data? Image-based Video-based both What should be achieved? Localization Segmentation Recognition How will the results be used? Indexing Object-based video encoding 34
Example: MPEG-4 Text Extraction Locate text of any size at any position in images, web pages and videos Segment and recognize text Encode extracted text as rigid foreground object in MPEG4 (with Yen- Kuang Chen) 27.5 PSNR Y 31.5 31 30.5 30 29.5 29 28.5 28 Signle VOP 160 165 170 175 180 185 190 195 KBits/sec Multiple VOP 35
Example: OCR result: Dec 25 1998 36
Text Detection Example - Latin Script 37
Text Detection: Korean Script Example 38
Text Extracted from Video 39
Searching Video Collections:Part I Introduction to Multimedia Information Retrieval Multimedia Representation Visual Features (Still Images and Image Sequences) Color Texture Shape Edges Objects, Motion Multimedia Indexing Video Segmentation Shot-Boundary Detection Effects Detection Beyond Basic Visual Features: Text, Face 40
Face Detection 41
Pool of Features => ~130.000 features for 24x24 window 42
Rapid Computation x y y Rainer Lienhart,Jochen Maydt. An Extended Set of Haar-like Features x for Rapid Object Detection. IEEE ICIP 2002, pp. 900-903, Sep. 2002. 43
Cascade of Classifiers Premise Input Pattern Size of feature pool (>100000) exceeds what any reasonable classifier can handle Cascade of classifiers (special kind of decision tree) can outperform a single stage classifier because it can use more features at the same computational complexity Use Boosting (Discrete/Real/ Gentle Adaboost, LogitBoost) P(x o) =.998 Stage 1 Stage 2 P(x o) =.998 2 =.996 Stage N P(x o) =.998 N ~.90 Object P(x o)=.5 P(x o) =.002 P(x o)=.5 2 P(x o) =.004 P(x o)=.5 N P(x o) ~.1 44
Cascade Concept Background removal in stage 3 Background removal in stage 4 Background removal in stage 1 Target Concept Background removal in stage 5 Background removal in stage 2 Background removal in stage 3 45
Face Recognition: Eigenfaces 46
Gracias por su Atencion 47
Searching Video Collections: Part I Overview Introduction to Multimedia Information Retrieval Multimedia Representation Multimedia Indexing Part II Audio Analysis Speech Indexing Query Formulation Multimedia Retrieval Part III Browsing Distribution/Streaming Evaluation Multimedia IR Applications Conclusions 48
Edge Detection Basic Idea: 1st and 2nd derivative of an edge position of the edge can be estimated with the maximum of the 1st derivative or with the zero-crossing of the 2nd derivative Generalize technique to calculate the derivative of a two-dimensional image 49
Canny Edge Detector designed to be an optimal edge detector (according to particular criteria) It takes as input a gray scale image as output an image showing the positions of tracked intensity discontinuities. 50
Canny Edge Detector Multi-stage process Image Smoothed by Gaussian Convolution Simple 2-D first derivative operator to highlight regions of the image with high first spatial derivatives tracks along the top of these ridges and sets to zero all pixels that are not actually on the ridge top non-maximal suppression The tracking process exhibits hysteresis 51