Leveraging Textural Features for Recognizing Actions in Low Quality Videos

Similar documents
Leveraging Textural Features for Recognizing Actions in Low Quality Videos

Action Recognition in Low Quality Videos by Jointly Using Shape, Motion and Texture Features

SPATIO-TEMPORAL MID-LEVEL FEATURE BANK FOR ACTION RECOGNITION IN LOW QUALITY VIDEO. Saimunur Rahman, John See

On the Effects of Low Video Quality in Human Action Recognition

Exploiting textures for better action recognition in low-quality videos

Deep CNN Object Features for Improved Action Recognition in Low Quality Videos

Action recognition in videos

Motion Interchange Patterns for Action Recognition in Unconstrained Videos

Evaluation of Local Space-time Descriptors based on Cuboid Detector in Human Action Recognition

Multiple Kernel Learning for Emotion Recognition in the Wild

Class 9 Action Recognition

Extracting Spatio-temporal Local Features Considering Consecutiveness of Motions

IMPROVING SPATIO-TEMPORAL FEATURE EXTRACTION TECHNIQUES AND THEIR APPLICATIONS IN ACTION CLASSIFICATION. Maral Mesmakhosroshahi, Joohee Kim

Person Action Recognition/Detection

LBP with Six Intersection Points: Reducing Redundant Information in LBP-TOP for Micro-expression Recognition

P-CNN: Pose-based CNN Features for Action Recognition. Iman Rezazadeh

EVENT DETECTION AND HUMAN BEHAVIOR RECOGNITION. Ing. Lorenzo Seidenari

WP1: Video Data Analysis

Recognizing Micro-Expressions & Spontaneous Expressions

Learning Realistic Human Actions from Movies

Action Recognition with HOG-OF Features

CS231N Section. Video Understanding 6/1/2018

Histogram of Flow and Pyramid Histogram of Visual Words for Action Recognition

Joint Segmentation and Recognition of Worker Actions using Semi-Markov Models

Evaluation of local descriptors for action recognition in videos

Large-scale Video Classification with Convolutional Neural Networks

Content Based Image Retrieval

Lecture 18: Human Motion Recognition

Human Activity Recognition Using a Dynamic Texture Based Method

Vision and Image Processing Lab., CRV Tutorial day- May 30, 2010 Ottawa, Canada

Real-Time Action Recognition System from a First-Person View Video Stream

Action Recognition Using Super Sparse Coding Vector with Spatio-Temporal Awareness

Adaptive Action Detection

Optical Flow Co-occurrence Matrices: A Novel Spatiotemporal Feature Descriptor

SLIDING WINDOW BASED MICRO-EXPRESSION SPOTTING: A BENCHMARK

Large-Scale Traffic Sign Recognition based on Local Features and Color Segmentation

Recognition of Animal Skin Texture Attributes in the Wild. Amey Dharwadker (aap2174) Kai Zhang (kz2213)

First-Person Animal Activity Recognition from Egocentric Videos

Dictionary of gray-level 3D patches for action recognition

Efficient and effective human action recognition in video through motion boundary description with a compact set of trajectories

6. Applications - Text recognition in videos - Semantic video analysis

Gait-based Person Identification using Motion Interchange Patterns

Content-based image and video analysis. Event Recognition

Motion Tracking and Event Understanding in Video Sequences

Action Recognition From Videos using Sparse Trajectories

IEEE TRANSACTIONS ON INFORMATION FORENSICS AND SECURITY 1

An evaluation of local action descriptors for human action classification in the presence of occlusion

Volume Local Phase Quantization for Blur-Insensitive Dynamic Texture Classification

Descriptors for CV. Introduc)on:

Sampling Strategies for Real-time Action Recognition

Object detection using non-redundant local Binary Patterns

Action Recognition & Categories via Spatial-Temporal Features

Understanding Sport Activities from Correspondences of Clustered Trajectories

CS229: Action Recognition in Tennis

People Detection and Video Understanding

Revisiting LBP-based Texture Models for Human Action Recognition

Tri-modal Human Body Segmentation

DYNAMIC BACKGROUND SUBTRACTION BASED ON SPATIAL EXTENDED CENTER-SYMMETRIC LOCAL BINARY PATTERN. Gengjian Xue, Jun Sun, Li Song

Evaluation of GIST descriptors for web scale image search

Evaluation of local spatio-temporal features for action recognition

Histogram of Oriented Gradients for Human Detection

Body Joint guided 3D Deep Convolutional Descriptors for Action Recognition

Chapter 2 Action Representation

Development in Object Detection. Junyuan Lin May 4th

Video Aesthetic Quality Assessment by Temporal Integration of Photo- and Motion-Based Features. Wei-Ta Chu

Local Part Model for Action Recognition in Realistic Videos

DA Progress report 2 Multi-view facial expression. classification Nikolas Hesse

Partial Least Squares Regression on Grassmannian Manifold for Emotion Recognition

Learning and Inferring Depth from Monocular Images. Jiyan Pan April 1, 2009

Learning Spatial Context: Using Stuff to Find Things

Video Texture. A.A. Efros

QMUL-ACTIVA: Person Runs detection for the TRECVID Surveillance Event Detection task

Quo Vadis, Action Recognition? A New Model and the Kinetics Dataset. By Joa õ Carreira and Andrew Zisserman Presenter: Zhisheng Huang 03/02/2018

Action Recognition Using Hybrid Feature Descriptor and VLAD Video Encoding

ShadowDraw Real-Time User Guidance for Freehand Drawing. Harshal Priyadarshi

From Activity to Language:

Computationally Efficient Serial Combination of Rotation-invariant and Rotation Compensating Iris Recognition Algorithms

Human Daily Action Analysis with Multi-View and Color-Depth Data

Two-Stream Convolutional Networks for Action Recognition in Videos

ILSVRC on a Smartphone

Digital Image Processing COSC 6380/4393

Human Action Recognition from Gradient Boundary Histograms

An efficient face recognition algorithm based on multi-kernel regularization learning

Simplex-Based 3D Spatio-Temporal Feature Description for Action Recognition

MULTI ORIENTATION PERFORMANCE OF FEATURE EXTRACTION FOR HUMAN HEAD RECOGNITION

Action Recognition by Dense Trajectories

Evaluation of Texture Descriptors for Automated Gender Estimation from Fingerprints

Robust PDF Table Locator

Computer Vision 2. SS 18 Dr. Benjamin Guthier Professur für Bildverarbeitung. Computer Vision 2 Dr. Benjamin Guthier

Classifying Images with Visual/Textual Cues. By Steven Kappes and Yan Cao

Anomaly Detection in Crowded Scenes by SL-HOF Descriptor and Foreground Classification

An Introduction to Content Based Image Retrieval

Spatio-temporal Feature Classifier

Human Action Recognition Using Silhouette Histogram

Local Features and Bag of Words Models

Statistics of Pairwise Co-occurring Local Spatio-Temporal Features for Human Action Recognition

Local Features and Kernels for Classifcation of Texture and Object Categories: A Comprehensive Study

Action Recognition in Video by Sparse Representation on Covariance Manifolds of Silhouette Tunnels

Object Detection with Partial Occlusion Based on a Deformable Parts-Based Model

Pose Adaptive Motion Feature Pooling for Human Action Analysis. Bingbing Ni, Pierre Moulin & Shuicheng Yan. International Journal of Computer Vision

Transcription:

Leveraging Textural Features for Recognizing Actions in Low Quality Videos Saimunur Rahman, John See, Chiung Ching Ho Centre of Visual Computing, Faculty of Computing and Informatics Multimedia University, Cyberjaya 63100, Selangor, Malaysia RoViSP 2016, Penang, Malaysia Rahman, See and Ho Leveraging Texture for HAR MMU, Cyberjaya 1 / 18

Visual human actions Human actions: major visual events in movies, news,... Low quality videos: low frame resolution, low frame rate, compression artifacts, motion blurring We recognize human actions from low quality videos Leverage textures with shape and motion features to improve action recognition form low quality videos. Rahman, See and Ho Leveraging Texture for HAR MMU, Cyberjaya 2 / 18

Motivation Recognizing human actions from video is of central importance due to its large real-world application domain: surveillance, human computer application, video indexing etc. Many methods have been proposed in recent years but majority are focused on high quality videos that oer ne details and strong signal delity. not suitable for real-time and lightweight applications Current methods are not designed for processing low quality videos. Rahman, See and Ho Leveraging Texture for HAR MMU, Cyberjaya 3 / 18

Summary of Approach Detect space-time patches by feature detector and describe using shape and motion descriptor. Calculate textural features from entire space-time volume. Combine shape, motion and textural features to improve performance. Summary of Contribution Propose textural features to alleviate the limitation of shape and motion features. Use BSIF-TOP as a textural feature descriptor for action recognition in low quality videos. Evaluate various textural features on low quality videos. Rahman, See and Ho Leveraging Texture for HAR MMU, Cyberjaya 4 / 18

Related Work Shape and motion features Space-Time Interest Points [Laptev et al'05] Dense Trajectories [Wang et al.'11] Textural features LBP-TOP [Kellokompu et al'09] Extended LBP-TOP [Mattvi and Shao'09] Similar approaches Joint Feature Utilization [Rahman et al'15, See and Rahman'15] Rahman, See and Ho Leveraging Texture for HAR MMU, Cyberjaya 5 / 18

Outline 1 Shape and Motion Features 2 Textural Features 3 Dataset 4 Evaluation Framework 5 Experimental Results 6 Conclusion Rahman, See and Ho Leveraging Texture for HAR MMU, Cyberjaya 6 / 18

Shape and Motion Feature Representation Spatio-temporal interest points are detected by Harris3D detector [Laptev'05]. Description of 3D patch around IPs using HOG and HOF [Laptev'08]. HOG - histogram of oriented gradients (encodes shape) HOF - histogram of optical ow (encodes motion) Rahman, See and Ho Leveraging Texture for HAR MMU, Cyberjaya 7 / 18

Textural Feature Representation Three types of textural features are calculated form entire space-time volume: LBP - Local Binary Pattern [Zhao et al.'08]. LPQ - Local Phase Quantization [Zhao et al.'08]. BSIF - Binarized Statistical Image Features [Kannala and Rahtu'12]. To obtain dynamic textures we apply three orthogonal plane (TOP) technique [Zhao et al.'08]. Features are calculated from XY, XT and YT plane of space-time volume (XYT). Rahman, See and Ho Leveraging Texture for HAR MMU, Cyberjaya 8 / 18

Dataset : KTH Action [Schüldt et al'04] Total 599 videos captured in a controlled environment. 6 action classes performed by 25 actors in 4 dierent scenarios. Sampling rate: 25 fps, Resolution: 160 120 pixels. Evaluation protocol: original experimental setup by authors. Six downsampled versions were cerated (3 spatial (SD α ) and 3 temporal (SD β ) ) We limit α, β = {2, 3, 4}, where α, β denotes spatial and temporal downsampling to half, one third and one fourth of the original resolution or frame rate respectively. Rahman, See and Ho Leveraging Texture for HAR MMU, Cyberjaya 9 / 18

Dataset : HMDB51 [Oh et al'11] Total 6,766 videos of 51 action classes collected from movies or YouTube. Videos are annotated with a rich set of meta-labels including quality information three quality labels were used, i.e. `good', `medium' and `bad'. Evaluation protocol: three training-testing split by authors. We use the split specied for training, while testing is done using only videos with 'bad' and 'medium' labels; for clarity, we denote them as HMDB-BQ and HMDB-MQ respectively. Rahman, See and Ho Leveraging Texture for HAR MMU, Cyberjaya 10 / 18

Evaluation Framework Shape-Motion feature detection Shape-Motion feature representation Input Video STIPs HOG/HOF Feature Encoding Multi-class non-linear SVM Textural feature calculation Textural feature representation t x y LBP/LPQ/BSIF spatio-temporal textures Feature histograms Rahman, See and Ho Leveraging Texture for HAR MMU, Cyberjaya 11 / 18

Experimental Results: KTH dataset Performance (average accuracy over all class) comparison: Best method: HOG+HOF+BSIF-TOP Spatially downsampled videos are highly beneted by textural features. BSIF-TOP outperform other textural features. Rahman, See and Ho Leveraging Texture for HAR MMU, Cyberjaya 12 / 18

Experimental Results: HMDB51 dataset Performance (average accuracy over all class) comparison: Best method: HOG+HOF+BSIF-TOP Texture vastly improve the performance of both `Bad' and `Medium' quality videos. BSIF-TOP outperform other textural features. Rahman, See and Ho Leveraging Texture for HAR MMU, Cyberjaya 13 / 18

Experimental Results: BSIF-TOP vs. other textures Performance improvement by BSIF-TOP over LBP-TOP and LPQ-TOP when aggregated with HOG+HOF: LPQ-TOP is better for spatially downsampled videos. LBP-TOP is better for temporally downsampled videos. Using BSIF-TOP, HMDB-LQ and HMDB-MQ results improves to almost double of baseline. Rahman, See and Ho Leveraging Texture for HAR MMU, Cyberjaya 14 / 18

Experimental Results: Computational Complexities Computational cost (feature detection/calculation + quantization time) of various feature descriptors: Runtime reported using a Core i7 3.6 GHz 32GB RAM machine. All test run on a sampled video from KTH-SD 2 dataset consist of 656 frames. Ranking of descriptors in terms of speed: LPQ-TOP > BSIF-TOP > HOG+HOF > LBP-TOP. Rahman, See and Ho Leveraging Texture for HAR MMU, Cyberjaya 15 / 18

Conclusion We leveraged on textural features to improve the recognition of human actions in low quality video clips. Considering that most current approaches involved only shape and motion features, the use of textural features is a novel proposition that improves the recognition performance by a good margin. BSIF-TOP oers a signicant leap of around 16% and 18% on the KTH-SD 4 and HMDB-MQ datasets respectively, over their original baselines. In future, we intend to extend this work towards a larger variety of human action datasets. It is also worth designing textural features that are more discriminative and robust towards complex backgrounds. Rahman, See and Ho Leveraging Texture for HAR MMU, Cyberjaya 16 / 18

Acknowledgement This work is supported, in part, by MOE Malaysia under Fundamental Research Grant Scheme (FRGS) project FRGS/2/2013/ICT07/MMU/ 03/4. Rahman, See and Ho Leveraging Texture for HAR MMU, Cyberjaya 17 / 18

Thank You! Q & A Rahman, See and Ho Leveraging Texture for HAR MMU, Cyberjaya 18 / 18