Video Aesthetic Quality Assessment by Temporal Integration of Photo- and Motion-Based Features. Wei-Ta Chu

Similar documents
Short Survey on Static Hand Gesture Recognition

Classification of Protein Crystallization Imagery

Short Run length Descriptor for Image Retrieval

Video Inter-frame Forgery Identification Based on Optical Flow Consistency

Robustness of Selective Desensitization Perceptron Against Irrelevant and Partially Relevant Features in Pattern Classification

Human Motion Detection and Tracking for Video Surveillance

CORRELATION BASED CAR NUMBER PLATE EXTRACTION SYSTEM

Classification of objects from Video Data (Group 30)

P-CNN: Pose-based CNN Features for Action Recognition. Iman Rezazadeh

Adaptive Learning of an Accurate Skin-Color Model

Spatial Information Based Image Classification Using Support Vector Machine

Comment Extraction from Blog Posts and Its Applications to Opinion Mining

MULTI ORIENTATION PERFORMANCE OF FEATURE EXTRACTION FOR HUMAN HEAD RECOGNITION

Perceptual Quality Improvement of Stereoscopic Images

Extracting Spatio-temporal Local Features Considering Consecutiveness of Motions

Development in Object Detection. Junyuan Lin May 4th

CAP 6412 Advanced Computer Vision

Partial Least Squares Regression on Grassmannian Manifold for Emotion Recognition

VIDEO AESTHETIC QUALITY ASSESSMENT USING KERNEL SUPPORT VECTOR MACHINE WITH ISOTROPIC GAUSSIAN SAMPLE UNCERTAINTY (KSVM-IGSU)

Video Syntax Analysis

Bus Detection and recognition for visually impaired people

Region-based Segmentation

C. Premsai 1, Prof. A. Kavya 2 School of Computer Science, School of Computer Science Engineering, Engineering VIT Chennai, VIT Chennai

Deep Learning For Video Classification. Presented by Natalie Carlebach & Gil Sharon

Computer Science Faculty, Bandar Lampung University, Bandar Lampung, Indonesia

Face Recognition Using Vector Quantization Histogram and Support Vector Machine Classifier Rong-sheng LI, Fei-fei LEE *, Yan YAN and Qiu CHEN

Research on Evaluation Method of Video Stabilization

CHAPTER 4 SEMANTIC REGION-BASED IMAGE RETRIEVAL (SRBIR)

A Cascade of Feed-Forward Classifiers for Fast Pedestrian Detection

Fully Automatic Methodology for Human Action Recognition Incorporating Dynamic Information

Handwritten Script Recognition at Block Level

Cursive Handwriting Recognition System Using Feature Extraction and Artificial Neural Network

CHAPTER 5 MOTION DETECTION AND ANALYSIS

CS 221: Object Recognition and Tracking

TEXT SEGMENTATION ON PHOTOREALISTIC IMAGES

CS 229 Final Project Report Learning to Decode Cognitive States of Rat using Functional Magnetic Resonance Imaging Time Series

Human Action Recognition Using Silhouette Histogram

Optical Flow-Based Person Tracking by Multiple Cameras

Tri-modal Human Body Segmentation

2. LITERATURE REVIEW

Mobile Human Detection Systems based on Sliding Windows Approach-A Review

International Journal of Modern Engineering and Research Technology

Leveraging Textural Features for Recognizing Actions in Low Quality Videos

Scanner Parameter Estimation Using Bilevel Scans of Star Charts

MATHEMATICAL IMAGE PROCESSING FOR AUTOMATIC NUMBER PLATE RECOGNITION SYSTEM

Robust PDF Table Locator

Graph Matching Iris Image Blocks with Local Binary Pattern

CHAPTER 3. Preprocessing and Feature Extraction. Techniques

Det De e t cting abnormal event n s Jaechul Kim

Research on Recognition and Classification of Moving Objects in Mixed Traffic Based on Video Detection

Improving Latent Fingerprint Matching Performance by Orientation Field Estimation using Localized Dictionaries

A Semi-Automatic 2D-to-3D Video Conversion with Adaptive Key-Frame Selection

Quo Vadis, Action Recognition? A New Model and the Kinetics Dataset. By Joa õ Carreira and Andrew Zisserman Presenter: Zhisheng Huang 03/02/2018

Detecting Printed and Handwritten Partial Copies of Line Drawings Embedded in Complex Backgrounds

Motion Tracking and Event Understanding in Video Sequences

Classification of Subject Motion for Improved Reconstruction of Dynamic Magnetic Resonance Imaging

Directionally-grouped CHLAC Motion Feature Extraction and Its Application to Sport Motion Analysis

Suspicious Activity Detection of Moving Object in Video Surveillance System

28 IEEE TRANSACTIONS ON MULTIMEDIA, VOL. 12, NO. 1, JANUARY 2010

Unsupervised Human Members Tracking Based on an Silhouette Detection and Analysis Scheme

Part-Based Skew Estimation for Mathematical Expressions

Automatic Seamless Face Replacement in Videos

Active Query Sensing for Mobile Location Search

Large-scale Video Classification with Convolutional Neural Networks

AN EFFICIENT BINARIZATION TECHNIQUE FOR FINGERPRINT IMAGES S. B. SRIDEVI M.Tech., Department of ECE

LECTURE 6 TEXT PROCESSING

Chapter 3 Image Registration. Chapter 3 Image Registration

Basic relations between pixels (Chapter 2)

[2008] IEEE. Reprinted, with permission, from [Yan Chen, Qiang Wu, Xiangjian He, Wenjing Jia,Tom Hintz, A Modified Mahalanobis Distance for Human

Neural Network based textural labeling of images in multimedia applications

DT-Binarize: A Hybrid Binarization Method using Decision Tree for Protein Crystallization Images

Self Lane Assignment Using Smart Mobile Camera For Intelligent GPS Navigation and Traffic Interpretation

Introduction to Medical Imaging (5XSA0) Module 5

Developing Open Source code for Pyramidal Histogram Feature Sets

Deep Tracking: Biologically Inspired Tracking with Deep Convolutional Networks

DETECTION OF SMOOTH TEXTURE IN FACIAL IMAGES FOR THE EVALUATION OF UNNATURAL CONTRAST ENHANCEMENT

AUTOMATIC VIDEO INDEXING

Clustering Large Credit Client Data Sets for Classification with SVM

A Survey on Face-Sketch Matching Techniques

Facial Expression Classification with Random Filters Feature Extraction

International Journal of Innovative Research in Computer and Communication Engineering

Genetic Algorithm for Seismic Velocity Picking

Object Tracking using HOG and SVM

Image Segmentation. 1Jyoti Hazrati, 2Kavita Rawat, 3Khush Batra. Dronacharya College Of Engineering, Farrukhnagar, Haryana, India

Random spatial sampling and majority voting based image thresholding

Convolution Neural Networks for Chinese Handwriting Recognition

Visual Attention Control by Sensor Space Segmentation for a Small Quadruped Robot based on Information Criterion

Training-Free, Generic Object Detection Using Locally Adaptive Regression Kernels

A novel template matching method for human detection

Preliminary Local Feature Selection by Support Vector Machine for Bag of Features

Beyond Bags of Features

Based on Multi-Modal Violent Movies Detection in Video Sharing Sites

Time Stamp Detection and Recognition in Video Frames

Automatic Video Caption Detection and Extraction in the DCT Compressed Domain

Copyright Detection System for Videos Using TIRI-DCT Algorithm

Learning to Recognize Faces in Realistic Conditions

Video De-interlacing with Scene Change Detection Based on 3D Wavelet Transform

A reversible data hiding based on adaptive prediction technique and histogram shifting

Methodology for spot quality evaluation

Classification with Class Overlapping: A Systematic Study

Transcription:

1 Video Aesthetic Quality Assessment by Temporal Integration of Photo- and Motion-Based Features Wei-Ta Chu H.-H. Yeh, C.-Y. Yang, M.-S. Lee, and C.-S. Chen, Video Aesthetic Quality Assessment by Temporal Integration of Photo- and Motion-Based Features, IEEE Transactions on Multimedia, vol. 15, no. 8, pp. 1944-1957, 2013.

Introduction 2 Six factors for visual aesthetic impression: color, form (shape), motion, spatial layout, depth, and human body. Goal: design an automatic VAQ assessor by integrating the aesthetic features in the temporal domain.

Overview of the Approach 3 Let be a set of videos, where each video is associated with a label representing its aesthetic value, say, 1 for appealing and 0 for unappealing, respectively. A video is separated into several nonoverlapping snippets and each snippet contains consecutive frames in one second. Two stage approach Snippet-based features (SBF) Temporal integration to combine the SBFs Perform a binary classification by SVM

4 Preprocessing for Motion-Related Features Suppose each video is divided into snippets, and each snippet is composed of frames, Motion-extraction preprocessing. Use the optical flow techniques to do pixel-based motion estimation. Saliency-based preprocessing. Use a video-based saliency region detection method.

5 Preprocessing for Motion-Related Features We binarize the video-based saliency map obtained for each frame, and then extract the largest connected component of the binarized map and refer it to as the foreground region. Motion-related features Motion space Motion direction entropy Hand-shaking Shooting type

Motion-Related Features 6 Motion Space (MS). As cameramen are videotaping a moving subject, they must beware that the space in front of the moving direction of the subject should be reserved for better VAQ. Setting the direction of the averaged optical flow of the entire image as v and the vector between the foreground center and image center as d. The feature of MS is

Motion-Related Features 7 Motion Direction Entropy (MDE). Compute the velocity directions of pixels and votes them into five predefined bins (upward, rightward, downward, leftward, and quasi-steady). A motion histogram is formed as where

Motion-Related Features 8 Hand shaking (HS). We set shaking detection area at the border which can better distinguish subject s self-shaking from the hand-shaking. The horizontal motion of the border region in frame j is defined as, which is the average of the horizontal projections of the optical flow in the region, and its direction is. The horizontal unstableness feature is defined as Similarly, vertical unstableness

Motion-Related Features 9 Shooting Type (ST). Define the border to central unstableness ratio for horizontal motion, where is the horizontal motion of the frame border, and is the horizontal motion of the frame except for the border. For border moving faster than the centre type, it may be the recording type that the shot traces and focuses on the subject. For border moving equally with the centre type, it may be a panorama shooting. For border moving slower than the centre type, it may be regarded as a static shot on a deformable subject.

Adoption of Photo-Based Features 10 Color harmonization Color saturation and value Composition Lightness

Snippet-Based Features 11 The frame-based features are pooled for a video snippet Six pooling operations: = {min, 1 st quartile, mean, median, 3 rd quartile, max} Combining these six pooling operators with the 22 frame-based features, there are thus a total of 132 SBFs allowed to be selected. For example, the SBF of (f 2, min) means that the snippet feature is built by extracting f 2 (the motion direction entropy) for each frame at first, and then pooled by min (i.e., taking the minimum). We denote the Cartesian product

Temporal-Variation-Aware Integration 12 Change of SBFs with time will affect the visual preference for audiences. A temporal-variation-aware (TVA) method to integrate the SBFs. Divide the video into non-overlapping segments and propose to use the correlation between segments to perform the integration. Let the k snippets in a video be divided into m segments (groups of snippets); each segment contains d successive snippets, and so k=md. Each group can be characterized by a d-dimensional feature vector if the snippet feature type has been selected in.

Temporal-Variation-Aware Integration 13 When the type of SBF is given, there are m feature vectors,, where. Let be the kernel function that measures the similarity in some implicit mapping space. To reflect the ith group s relevance By doing so, we obtain m integration operators To construct the kernel where is the sign function.

Temporal-Variation-Aware Integration 14 We include further the (m+1)th and (m+2)th operators which are respectively the mean and standard deviation, denoted as and We have conducted three versions of the proposed methods for comparison.

Selection of Key Dimensions 15 The first version, AFSim, selects the features from, i.e., it integrates the SBFs by using only the simple average operators, mean and standard deviation. The second version, AFTVA, selects the features from i.e., it uses the TVA operators to integrate the SBFs. The third version, TVASim, selects the features from i.e., all operators are allowed to be used for temporal integration. The combination of SBFs and temporal integration is very high-dimensional. We employ the sequential forward-search method that is a greedy approach to pick the combination of SBF and integration-operator in turn.

Selection of Key Dimensions 16 The procedure of selection 1. Find a SBF-operator pair that best classifies all videos 2. Pick another SBF-operator pair, which ahs the best accuracy jointed with the previously selected pairs. 3. Repeat Step 2 until the number of selections reaches a predefined number Combining the SBF of (f 1, mean) with the temporal integration operator means that we build the video-based feature by extracting f 1 for each frame at first, pooling the features for frames in a snippet by mean to obtain the SBF, and then perform the temporal operator to integrate the values of all SBFs in a video.

Experimental Results 17 Telefonica dataset: 160 rated videos. Each video is cropped as a 15-seconds clip. 33 participants rated the videos. The rating values range between [-2,2]. We sort these MOSs and choose the median value as a threshold to separate these 160 videos into positive (high quality) and negative (low quality)

Experimental Results 18 Comparisons of designed features. 5-fold cross-validation and repeat it 200 times to obtain the averaged accuracy. The assessment accuracies are by using AFSim and by using MoorthySim. The motion-based features such as ST and HS have been chosen in the first three runs.

Experimental Results 19 Comparisons of temporal integrations. AFTVA and TVASim perform better than AFSim and MoorthySim. When the number of selection is increased, AFTVA can further boost the performance by keeping finding more discriminating SBFs. AFTVA can turn more video-based features into distinctive clues.

Experimental Results 20 By considering both TVA and the averaged long-term operators (i.e., mean, std), TVASim achieves even-better performance. The top ranked feature types in TVASim are ST, MS, and HS, showing again that motion-related features are highly effective in determining VAQ.