Automatic Shot Boundary Detection and Classification of Indoor and Outdoor Scenes

Similar documents
Recall precision graph

AIIA shot boundary detection at TRECVID 2006

Video shot segmentation using late fusion technique

A Robust Wipe Detection Algorithm

Video Key-Frame Extraction using Entropy value as Global and Local Feature

5. Hampapur, A., Jain, R., and Weymouth, T., Digital Video Segmentation, Proc. ACM Multimedia 94, San Francisco, CA, October, 1994, pp

CHAPTER 3 SHOT DETECTION AND KEY FRAME EXTRACTION

Analysis of Image and Video Using Color, Texture and Shape Features for Object Identification

PixSO: A System for Video Shot Detection

MULTIVIEW REPRESENTATION OF 3D OBJECTS OF A SCENE USING VIDEO SEQUENCES

An ICA based Approach for Complex Color Scene Text Binarization

Video De-interlacing with Scene Change Detection Based on 3D Wavelet Transform

Gradual Shot Boundary Detection and Classification Based on Fractal Analysis

Shot segmentation and edit effects

The Detection of Faces in Color Images: EE368 Project Report

This leads to our algorithm which is outlined in Section III, along with a tabular summary of it's performance on several benchmarks. The last section

Holistic Correlation of Color Models, Color Features and Distance Metrics on Content-Based Image Retrieval

Lecture 11: Classification

Shot Detection using Pixel wise Difference with Adaptive Threshold and Color Histogram Method in Compressed and Uncompressed Video

Key Frame Extraction using Faber-Schauder Wavelet

Minimally Segmenting High Performance Bangla Optical Character Recognition Using Kohonen Network

Data Classification 1

Scene Change Detection Based on Twice Difference of Luminance Histograms

A SHOT BOUNDARY DETECTION TECHNIQUE BASED ON LOCAL COLOR MOMENTS IN YC B C R COLOR SPACE

Scene Detection Media Mining I

A Graph Theoretic Approach to Image Database Retrieval

In this assignment, we investigated the use of neural networks for supervised classification

Seismic regionalization based on an artificial neural network

Texture Segmentation by Windowed Projection

Introduction to Medical Imaging (5XSA0) Module 5

AUGMENTING GPS SPEED LIMIT MONITORING WITH ROAD SIDE VISUAL INFORMATION. M.L. Eichner*, T.P. Breckon

Compressed-Domain Shot Boundary Detection for H.264/AVC Using Intra Partitioning Maps

Classification of stroke disease using convolutional neural network

1. Introduction. 2. Motivation and Problem Definition. Volume 8 Issue 2, February Susmita Mohapatra

Visual object classification by sparse convolutional neural networks

Chapter 4. The Classification of Species and Colors of Finished Wooden Parts Using RBFNs

International Journal of Electrical, Electronics ISSN No. (Online): and Computer Engineering 3(2): 85-90(2014)

LECTURE NOTES Professor Anita Wasilewska NEURAL NETWORKS

Classification. Vladimir Curic. Centre for Image Analysis Swedish University of Agricultural Sciences Uppsala University

An Efficient Fully Unsupervised Video Object Segmentation Scheme Using an Adaptive Neural-Network Classifier Architecture

Lecture 2 Notes. Outline. Neural Networks. The Big Idea. Architecture. Instructors: Parth Shah, Riju Pahwa

Traffic Signs Recognition using HP and HOG Descriptors Combined to MLP and SVM Classifiers

OBJECT RECOGNITION: OBTAINING 2-D RECONSTRUCTIONS FROM COLOR EDGES. G. Bellaire**, K. Talmi*, E. Oezguer *, and A. Koschan*

TEVI: Text Extraction for Video Indexing

ADAPTIVE TEXTURE IMAGE RETRIEVAL IN TRANSFORM DOMAIN

Assignment # 5. Farrukh Jabeen Due Date: November 2, Neural Networks: Backpropation

A Data Classification Algorithm of Internet of Things Based on Neural Network

FACE DETECTION AND RECOGNITION OF DRAWN CHARACTERS HERMAN CHAU

Spam Filtering Using Visual Features

NOVATEUR PUBLICATIONS INTERNATIONAL JOURNAL OF INNOVATIONS IN ENGINEERING RESEARCH AND TECHNOLOGY [IJIERT] ISSN: VOLUME 5, ISSUE

AN EFFICIENT BATIK IMAGE RETRIEVAL SYSTEM BASED ON COLOR AND TEXTURE FEATURES

Instantaneously trained neural networks with complex inputs

Detection and recognition of moving objects using statistical motion detection and Fourier descriptors

Color Image Segmentation

Extraction Characters from Scene Image based on Shape Properties and Geometric Features

Video Partitioning by Temporal Slice Coherency

Analytical model A structure and process for analyzing a dataset. For example, a decision tree is a model for the classification of a dataset.

2. Neural network basics

Segmentation of Images

Handwritten Devanagari Character Recognition Model Using Neural Network

CONTENT BASED VIDEO RETRIEVAL SYSTEM

A Frame Quality Improvement before Video Shot Boundary Detection

Alarms & Events Plug-In Kepware Technologies

Artificial Neural Networks Lecture Notes Part 5. Stephen Lucci, PhD. Part 5

AViTExt: Automatic Video Text Extraction

Post-Classification Change Detection of High Resolution Satellite Images Using AdaBoost Classifier

Evaluation of texture features for image segmentation

Ensemble methods in machine learning. Example. Neural networks. Neural networks

An Algorithm For Training Multilayer Perceptron (MLP) For Image Reconstruction Using Neural Network Without Overfitting.

Key Frame Extraction and Indexing for Multimedia Databases

Production of Video Images by Computer Controlled Cameras and Its Application to TV Conference System

Artificial Neural Networks (Feedforward Nets)

Automatic Video Caption Detection and Extraction in the DCT Compressed Domain

City, University of London Institutional Repository

Predicting Messaging Response Time in a Long Distance Relationship

A neural network that classifies glass either as window or non-window depending on the glass chemistry.

Conspicuous Character Patterns

CS6220: DATA MINING TECHNIQUES

Improving the Efficiency of Fast Using Semantic Similarity Algorithm

CIS UDEL Working Notes on ImageCLEF 2015: Compound figure detection task

Measuring similarities in contextual maps as a support for handwritten classification using recurrent neural networks. Pilar Gómez-Gil, PhD ISCI 2012

Notes on Multilayer, Feedforward Neural Networks

Digital Image Processing. Lecture # 15 Image Segmentation & Texture

Performance Comparisons of Gradual Transition Detection Methods

Course on Artificial Intelligence and Intelligent Systems

eyes can easily detect and count these cells, and such knowledge is very hard to duplicate for computer.

Cluster Analysis using Spherical SOM

Efficient SQL-Querying Method for Data Mining in Large Data Bases

Medical images, segmentation and analysis

A Symmetry Operator and Its Application to the RoboCup

Artifacts and Textured Region Detection

Using Hidden Semi-Markov Models for Effective Online Failure Prediction

Texture Classification by Combining Local Binary Pattern Features and a Self-Organizing Map

Design and Performance Analysis of and Gate using Synaptic Inputs for Neural Network Application

Automatic Parameter Adaptation for Multi-Object Tracking

Browsing News and TAlk Video on a Consumer Electronics Platform Using face Detection

Text-Tracking Wearable Camera System for the Blind

Image Classification Using Wavelet Coefficients in Low-pass Bands

Lecture 8 Object Descriptors

A Study on Clustering Method by Self-Organizing Map and Information Criteria

Transcription:

Automatic Shot Boundary Detection and Classification of Indoor and Outdoor Scenes A. Miene, Th. Hermes, G. Ioannidis, R. Fathi, and O. Herzog TZI - Center for Computing Technologies University of Bremen Universitätsallee 21-23 D-28359 Bremen, Germany {andrea hermes gtis fathi herzog}@tzi.de Abstract. This paper describes our contribution to the TREC 2002 video analysis track. We participated in the shot detection task and in the feature extraction task (features indoors and outdoors). The shot detection approach is based on histogram differences and uses adaptive thresholds. Multiple detected shot boundaries that follow each other within a short temporal interval are grouped together and classified as a gradual change beginning with the first and ending with the last shot boundary in the interval. For the feature extraction task we examined whether it is possible to classify indoor and outdoor shots by their color distribution. In order to analyze the color distribution we use first order statistical features. The shots are classified into indoor and outdoor shots using a neural net. 1 Introduction The Center for Computing Technologies (TZI), University of Bremen, Germany, participated in the video analysis track in the shot detection task and in the feature extraction task (features indoors and outdoors). The shot detection approach is based on histogram differences. It is divided into two steps - feature extraction and shot boundary detection. Firstly, the histogram differences are calculated for the entire video in real time. Secondly, shot boundaries are detected. The advantage of this approach is the possibility to set adaptive thresholds for the shot boundary detection considering all extracted features of the complete video sequence. The adaptive threshold is set to a percentage of the maximum of all calculated difference values of the video. In the case of gradual changes, often multiple shot boundaries are detected. Therefore multiple detected shot boundaries that follow each other within a short temporal interval are grouped together and a gradual change is detected beginning with the first and ending with the last shot boundary in the interval. The approach is explained in more detail in section 2. To extract the features indoors and outdoors we use a feed forward neural net as a classificator, trained by a backpropagation learning rule. The input is a feature vector describing the color distribution of an image. The output is the

probability for each feature (indoors and outdoors) to appear in the image. The approach is discussed in section 3. 2 Shot detection Quite a lot of approaches to shot boundary detection were proposed in the literature. An overview is given in [Lienhart, 1999,Yusoff et al., 1998]. The principle methodology of shot boundary detection is to extract one or more features from every nth frame of a video sequence, to compute the difference of features for consecutive frames, and to compare these differences to a given threshold. Each time the threshold is exceeded a shot boundary is detected. The various approaches differ concerning the used features. The shot boundary detection system we used for TREC 2002 is based on the approach presented in [Miene et al., 2001]. As mentioned before, the approach can be divided into two main parts. The first part is to extract all needed features from a video. The second part is to detect the shot boundaries based on the previously extracted features. For the feature extraction part each frame is converted into a grayscale image. Then a histogram H G is created. Subsequently, the squared differences between each two consecutive frames H GDiff (n, n 1) = 255 i=0 (H G (n)(i) H G (n 1)(i)) 2 Max(H G (n)(i), H G (n 1)(i)) are calculated. H G (n)(i) denotes a grayscale histogram value at index i of frame n. Max(H G (n)(i), H G (n 1)(i)) denotes the maximum of both grayscale histogram values H Gray (n)(i) and H Gray (n 1)(i), and is used as a normalization factor. This leads to a feature difference list in order to detect shot boundaries, which is compared to a threshold. To determine the adaptive threshold, the maximum of all calculated difference values of the actual video is calculated. The adaptive threshold for the actual video is specified as a percental value of the maximum: T h = Max{H G Diff (1, 0),..., H GDiff (n, n 1)} T h percentage (2) 100 For gradual changes like dissolves or wipes the shot boundary detection often detects more than one boundary per shot. Therefore, all shot boundaries which belong to the same shot have to be merged into one boundary. This step is illustrated in Figure 1. Shot boundaries are merged together if the temporal distance between their occurrences is less than a threshold. The minimal frame number of the merged shot boundaries determines the start, and the maximum frame number determines the end of the gradual change. The exact boundary position is set to the maximum feature difference value within the merged shot boundaries. Before preparing our results for TREC we tested our shot detection on three videos from the feature developement collection, for which we determined the (1)

Fig. 1. Merging of multiple detected shot boundaries[miene et al., 2001]. shot boundaries manually. The results of this experiment are shown in table 2. File denotes the file name of the video, human the amount shot boundaries determined manually, auto the total number of shot boundaries detected by the system, correct the amount of correct detected shot boundaries, false the amount of false alarms and missing the amount of shot boundaries, that were not detected by the system. The last two colums contain the percentage values for precission and recall. For this test we did not distinguish between hard cuts and gradual changes. file human auto correct false missing recall precission 00028a.mpg 116 79 79 0 37 68.1 100 00435.mpg 108 54 47 7 61 42.6 85.2 00535.mpg 120 108 100 8 20 83.3 92.6 over all 344 241 226 15 118 65.7 93.8 In the following section we present our approach for the classification of indoor and outdoor scenes. 3 Classification of indoor and outdoor scenes For the feature extraction task we have examined whether it is possible to classify indoor and outdoor shots by their color distribution. In order to analyze the color distribution, first order statistical features are used, which are extracted from the histograms of the three color channels (RGB) and the grey level histogram. The features calculated from each histogram are average, variance, and amount of peaks, normalized to an interval [0.0,..., 1.0]. Therefore we calculate 12 statistical features alltogether. In order to classify the shots into indoor and outdoor shots, a feed forward neural net with backpropagation learning was trained. For this task we used the SNNS (Stuttgart Neural Network Simulator) [SNNSv4.2, 2002]. At the input layer the 12 statistical features are presented, that were obtained from the histograms. The output layer consists of two neurons that take on

values between 0.0 and 1.0 measuring the probability for the features indoors or outdoors in the shot. Two hidden layers with 20 neurons each are initialized with random weights. In order to train the neural net, some videos from the feature development collection were chosen. The shots are classified manually to generate 323 training data sets, 178 for indoors, and 145 for outdoors. Figure 2 shows the trained neural net. Fig. 2. Trained neural net. In order to classify the shots from the feature extraction test collection, a set of n key frames is extracted from each shot. Every kth frame of a shot is used as a key frame, but in order to be more independent of inaccuracies during the shot detection and of gradual changes (e.g., wipes, fades, or dissolves) a number of frames around the shot boundaries is skipped (see Figure 3). In order to classify a shot, the set of n key frames is presented to the neural net. Fig. 3. Extraction of key frames. For each of the two output neurons a list is obtained containing n values, one for each key frame. The median is calculated for each list to obtain the final indoors or outdoors probabilities for the shot. In order to measure the accuracy of the classification result, the difference between the median values of the indoors and the outdoors neuron is calculated. If the difference exceeds a threshold the shot is classified to contain the feature with the higher probability. The difference is also used for the ranking.

4 Future Work For the next months and also for our next participation in TREC 2003 we will concentrate on the improvement of our shot detection approach concerning the detection of gradual changes. We have also to examine how to obtain better results for the extraction of the indoors and outdoors features. As mentioned before the neural net was trained with indoor and outdoor example frames from the feature development collection. The results from the TREC evaluation of our results shows that there are serious problems with the classification of the material from the feature test collection. At the moment we are working on the analysis of these problems. One major problem appears to be, that we trained the net only with examples of indoor and outdoor scenes. Therefore the results for scenes containing neither indoor nor outdoor scenes are undefined. Therefore, especially artificial scenes lead to a wrong classification. In addition we are looking forward to develop further modules for our feature extraction system to be able to extract other features like text or human faces. References [Lienhart, 1999] Lienhart, R. (1999). Comparison of Automatic Shot Boundary Detection Algorithms. In Proc. SPIE Vol. 3656 Storage and Retrieval for Image and Video Databases VII, pages 290 301, San Jose, CA, USA. [Miene et al., 2001] Miene, A., Dammeyer, A., Hermes, T., and Herzog, O. (2001). Advanced and adapted shot boundary detection. In Fellner, D. W., Fuhr, N., and Witten, I., editors, Proc. of ECDL WS Generalized Documents, pages 39 43. [SNNSv4.2, 2002] SNNSv4.2 (2002). SNNS Stuttgart Neuronal Network Simulator User Manual, Version 4.2. University of Stuttgart and University of Tübingen. http://www-ra.informatik.unituebingen.de/downloads/snns/snnsv4.2.manual.pdf. [Yusoff et al., 1998] Yusoff, Y., Christmas, W., and Kittler, J. (1998). A study on automatic shot change detection. Lecture Notes in Computer Science, 1425.