Video Processing for Judicial Applications

Similar documents
Fast Natural Feature Tracking for Mobile Augmented Reality Applications

Visual Tracking (1) Tracking of Feature Points and Planar Rigid Objects

Extracting Spatio-temporal Local Features Considering Consecutiveness of Motions

Visual Tracking (1) Pixel-intensity-based methods

SIFT: SCALE INVARIANT FEATURE TRANSFORM SURF: SPEEDED UP ROBUST FEATURES BASHAR ALSADIK EOS DEPT. TOPMAP M13 3D GEOINFORMATION FROM IMAGES 2014

Visual Tracking (1) Feature Point Tracking and Block Matching

Leow Wee Kheng CS4243 Computer Vision and Pattern Recognition. Motion Tracking. CS4243 Motion Tracking 1

SURF: Speeded Up Robust Features. CRV Tutorial Day 2010 David Chi Chung Tam Ryerson University

Combination of Accumulated Motion and Color Segmentation for Human Activity Analysis

Available online at ScienceDirect. Procedia Computer Science 22 (2013 )

SURF. Lecture6: SURF and HOG. Integral Image. Feature Evaluation with Integral Image

Feature Detection and Matching

Scale Invariant Feature Transform

Outline 7/2/201011/6/

IMAGE RETRIEVAL USING VLAD WITH MULTIPLE FEATURES

Scale Invariant Feature Transform

Peripheral drift illusion

Object Recognition Algorithms for Computer Vision System: A Survey

CS221: Final Project Report - Object Recognition and Tracking

Optical flow and tracking

Implementation and Comparison of Feature Detection Methods in Image Mosaicing

Features Points. Andrea Torsello DAIS Università Ca Foscari via Torino 155, Mestre (VE)

COMPUTER VISION > OPTICAL FLOW UTRECHT UNIVERSITY RONALD POPPE

Feature Detection. Raul Queiroz Feitosa. 3/30/2017 Feature Detection 1

Parallel Tracking. Henry Spang Ethan Peters

Research Article Combination of Accumulated Motion and Color Segmentation for Human Activity Analysis

Lecture 10 Detectors and descriptors

IMPROVING SPATIO-TEMPORAL FEATURE EXTRACTION TECHNIQUES AND THEIR APPLICATIONS IN ACTION CLASSIFICATION. Maral Mesmakhosroshahi, Joohee Kim

Fast Image Matching Using Multi-level Texture Descriptor

Eppur si muove ( And yet it moves )

A Comparison of SIFT, PCA-SIFT and SURF

The SIFT (Scale Invariant Feature

International Journal of Advanced Research in Computer Science and Software Engineering

WAVELET TRANSFORM BASED FEATURE DETECTION

Key properties of local features

The most cited papers in Computer Vision

K-Means Based Matching Algorithm for Multi-Resolution Feature Descriptors

Ensemble of Bayesian Filters for Loop Closure Detection

CS 378: Autonomous Intelligent Robotics. Instructor: Jivko Sinapov

Problems with template matching

Towards the completion of assignment 1

EE368 Project Report CD Cover Recognition Using Modified SIFT Algorithm

CS 4495 Computer Vision Motion and Optic Flow

LOCAL AND GLOBAL DESCRIPTORS FOR PLACE RECOGNITION IN ROBOTICS

Image Features: Detection, Description, and Matching and their Applications

EE795: Computer Vision and Intelligent Systems

Ninio, J. and Stevens, K. A. (2000) Variations on the Hermann grid: an extinction illusion. Perception, 29,

Stereoscopic Images Generation By Monocular Camera

III. VERVIEW OF THE METHODS

Object Detection by Point Feature Matching using Matlab

A Rapid Automatic Image Registration Method Based on Improved SIFT

Feature Tracking and Optical Flow

Corner Detection. Harvey Rhody Chester F. Carlson Center for Imaging Science Rochester Institute of Technology

ACEEE Int. J. on Information Technology, Vol. 02, No. 01, March 2012

Patch-based Object Recognition. Basic Idea

VK Multimedia Information Systems

SIFT: Scale Invariant Feature Transform

SUMMARY: DISTINCTIVE IMAGE FEATURES FROM SCALE- INVARIANT KEYPOINTS

Motion Estimation and Optical Flow Tracking

Human Detection and Action Recognition. in Video Sequences

Computer Vision for HCI. Topics of This Lecture

Feature Based Registration - Image Alignment

Scale Invariant Feature Transform by David Lowe

An Angle Estimation to Landmarks for Autonomous Satellite Navigation

Local features and image matching. Prof. Xin Yang HUST

Motion and Optical Flow. Slides from Ce Liu, Steve Seitz, Larry Zitnick, Ali Farhadi

SCALE INVARIANT FEATURE TRANSFORM (SIFT)

Recognizing Apples by Piecing Together the Segmentation Puzzle

Feature Tracking and Optical Flow

Image processing and features

Disparity Search Range Estimation: Enforcing Temporal Consistency

EECS150 - Digital Design Lecture 14 FIFO 2 and SIFT. Recap and Outline

ECE Digital Image Processing and Introduction to Computer Vision

CS231A Section 6: Problem Set 3

Image Matching Using SIFT, SURF, BRIEF and ORB: Performance Comparison for Distorted Images

IMAGE-GUIDED TOURS: FAST-APPROXIMATED SIFT WITH U-SURF FEATURES

Object Recognition with Invariant Features

Recognition of a Predefined Landmark Using Optical Flow Sensor/Camera

A Novel Real-Time Feature Matching Scheme

CS 223B Computer Vision Problem Set 3

Action Recognition with HOG-OF Features

Invariant Feature Extraction using 3D Silhouette Modeling

Kanade Lucas Tomasi Tracking (KLT tracker)

CS6670: Computer Vision

Augmented Reality VU. Computer Vision 3D Registration (2) Prof. Vincent Lepetit

Chapter 9 Object Tracking an Overview

A Comparison of SIFT and SURF

Multi-stable Perception. Necker Cube

Notes 9: Optical Flow

Determinant of homography-matrix-based multiple-object recognition

Lecture 4.1 Feature descriptors. Trym Vegard Haavardsholm

CAP 5415 Computer Vision Fall 2012

arxiv: v1 [cs.cv] 1 Jan 2019

Feature Descriptors. CS 510 Lecture #21 April 29 th, 2013

CS 4495 Computer Vision A. Bobick. CS 4495 Computer Vision. Features 2 SIFT descriptor. Aaron Bobick School of Interactive Computing

Local Patch Descriptors

A NEW FEATURE BASED IMAGE REGISTRATION ALGORITHM INTRODUCTION

Computer and Machine Vision

Adaptive Zoom Distance Measuring System of Camera Based on the Ranging of Binocular Vision

Large-Scale 3D Point Cloud Processing Tutorial 2013

Transcription:

Video Processing for Judicial Applications Konstantinos Avgerinakis, Alexia Briassouli, Ioannis Kompatsiaris Informatics and Telematics Institute, Centre for Research and Technology, Hellas Thessaloniki, Greece {koafgeri,abria,ikom}@iti.gr http://www.certh.gr/,www.iti.gr Abstract. The use of multimedia data has expanded into many domains and applications beyond technical usage, such as surveillance, home monitoring, health supervision, judicial applications. This work is concerned with the application of video processing techniques to judicial trials in order to extract useful information from them. The automated processing of the large amounts of digital data generated in court trials can greatly facilitate their browsing and access. Video information can provide clues about the state of mind and emotion of the speakers, information which cannot be derived from the textual transcripts of the trial, and even from the audio recordings. For this reason, we focus on analyzing the motions taking place in the video, and mainly on tracking gestures or head movements. A wide range of methods is examined, in order to find which one is most suitable for judicial applications. Key words: video analysis, recognition, judicial applications 1 Introduction The motion analysis of the judicial videos is based on various combinations of video processing algorithms, in order to achieve reliable localization and tracking of significant features in the video. Initially, optical flow is applied to the video, in order to extract the moving pixels and their activity. Numerous algorithms exist for the estimation of optical flow, like the Horn-Schunck developed in 1981 [1] and the Lucas-Kanade [2]. A more recent and sophisticated approach was developed by Bouguet in 2000 [3]. This method implements a sparse iterative version of Lucas-Kanade optical flow in pyramids. By applying the optical flow algorithm, we separate the video frame pixels into a set of points that are moving and a static set of points. As some of the optical flow estimates may be caused by measurement noise, a more accurate separation of the pixels into static and active can be obtained by applying higher-order statistics, namely the kurtosis, to the optical flow data. After the pixel motion is estimated and the active pixels are separated from the static ones, only the active pixels are processed. This allows the system to operate with fewer errors that would be caused by confusing the noise in static pixels with true motion. The next step in the motion analysis is the interest point Post-proceedings of the 2nd International Conference on ICT Solutions for Justice (ICT4Justice 2009) 58

tracking in active pixels. The goal of this stage is to obtain the points which will define the human action. Firstly we want to define the object that appears to move through the frames. This object can be characterized using specific features that will define the image and the object singularly. These specific feature points then need to be matched from frame to frame so as to acquire interest point tracking. Several state of the art algorithms have been examined for the detection and description of these features. These are: The SIFT (Scale Invariant Feature Transform) algorithm which had been published by David Lowe in 1999 [4]. The SURF(Speeded Up Robust Features) algorithm which had been presented by Herbert Bay in 2006 [5], and is based on sums of 2D Haar wavelet responses to make an efficient use of integral images. The Harris-Stephens corner detection algorithm developed in 1988 and finds corners with big eigenvalues in the image [6]. After the feature points are detected, interest point matching is needed. This can be accomplished using a kd-tree algorithm, and specifically the BBF(Best Bin First) algorithm. The BFF finds the closest neighbor of each feature in the next frame based on the two feature s descriptors distance. In the rest of the paper, the algorithms and their combinations that are used are explained in detail. The results of each technique are shown to demonstrate which ones give the best results. 2 Kurtosis-based Activity Area In of the methods used in this work, the optical flow estimates can be used immediately, or further processed by higher order statistics in order to obtain a more accurate estimate of the active pixels [7]. The optical flow values may be caused by true motion in each pixel, or by measurement noise, expressed by the following two hypotheses: H 0 : v 0 k( r) = z k ( r) H 1 : v 1 k( r) = u k ( r) + z k ( r), (1) where vk i ( r) expresses the flow estimate at frame k and pixel r and i = 0, 1 depending on the corresponding hypothesis. Also, u k ( r) is the illumination variation caused by true motion and z k ( r) is the additive measurement noise. In the literature, additive noise is often modeled by a Gaussian distribution. Thus the separation of the active from the static pixels is reduced to the detection of Gaussianity in the flow estimates. A classical measure of Gaussianity is the kurtosis, which is zero for a Gaussian random variable and is given by: kurt(y) = E[y 4 ] 3(E[y 2 ]) 2. (2) for a random variable y. Thus, once the optical flow is estimated over a sequence of frames, its kurtosis is estimated at each pixel r, leading to an image of the kurtosis values. The kurtosis image can be easily binarized, as the values of the kurtosis for active pixels are significantly higher than those for static pixels with 59

noise-induced flow values. This leads to a binary mask showing pixel activity, called the Activity Area, which can be used to isolate the active from the static pixels. 3 Combination of Features and Flow for Active Feature Localization 3.1 HS Optical Flow, Kurtosis, SIFT, BFF Matching In the first approach examined, the Horn-Schunck algorithm is used to estimate the optical flow of the video [1]. The optical flow values are processed using the kurtosis-based technique of Sec. 2 in order to extract the Activity Area for that video. The Activity Area is shown in Fig. 1, and a video frame masked by the Activity Area is shown in Fig. 1, from where it can be seen that the moving arm is correctly masked. The Activity Area is used to separate active pixels from the static ones, which are ignored in the rest of the processing. The SIFT algorithm is applied to the active pixels in order to extract features of interest from them [4]. This feature detector is chosen due to its robustness, and in particular because it has been shown to be invariant to image scale and rotation, as well as robust to local occlusion. The resulting features are matched between successive frames, and the results are shown in Fig. 2. In Fig. 2 the blue points indicate which SIFT features have not been matched and the purple lines show the matching between two frames. These results are good but not entirely accurate, as a small number of features is found, and the matching does not provide rich enough information about the activity taking place. Therefore, more methods are examined for increased accuracy in the sequel. Fig. 1. Activity Area extracted via the kurtosis method. Video frame masked by the Activity Area. 60

Fig. 2. SIFT points detected on video frames masked by the Activity Area extracted via the kurtosis method. 3.2 Harris Tomas Corner Detection, Lukas-Kanade pyramidal optical Flow, Kurtosis As features of interest often appear near corners, in this set of experiments the Harris-Thomas Corner Detection is initially used to detect feature points[6]. As Fig. 3 shows, this method provides more feature points than the previously used SIFT feature point detector, and can be therefore considered to give more reliable and robust results. The motion of these feature points also needs to be estimated, to better understand the activity taking place - in this case a gesture. A more recent and sophisticated method for optical flow estimation is used here, namely the pyramidal Lukas-Kanade optical flow, which can deal with both large and small flow values. The kurtosis method is applied again, in order to accurately isolate the truly active features. As Fig. 4 shows, the resulting masked features points provide a good localization of the activity of interest in the video, and can therefore be used in subsequent stages for activity classification or recognition. Fig. 3. Harris-Thomas feature points detected on video frames. 61

Fig. 4. Harris-Thomas feature points masked by the Activity Area. 3.3 SIFT, Lukas-Kanade pyramidal optical Flow, Kurtosis In this case, the features are detected using the SIFT algorithm initially, with the results shown in Fig. 5. Afterwards, a more sophisticated method for estimating the motion is used. Namely, the pyramidal Lucas-Kanade is applied to them in order to find their motion throughout the video [3]. The kurtosis of these flow values is estimated so as to eliminate the feature points that are not truly moving, and only keep the active ones. Fig. 6 shows that in this case, fewer features of interest (i.e. moving feature points) are detected than in the method of Sec. 3.2. Fig. 5. SIFT feature points. 62

Fig. 6. SIFT feature points masked by the Activity Area. 3.4 SURF, Horn-Schunck optical Flow, Kurtosis In this case, Horn-Schunk optical flow is applied to the pixels where features were detected, and the truly active pixels are extracted, as before, by applying the kurtosis method described above. Afterwards, the resulting Activity Areas are used to mask the video and the resulting, smaller sequence, is input to the SURF algorithm. The SURF [5] method is examined for feature extraction as it has been shown to be more robust than the SIFT algorithm against a wider range of image transformations. Additionally, it runs significantly faster, which is important when processing many long video streams. Features on the active pixels are then found by SURF and, as Fig. 7 shows, they can successfully isolate the most interesting active parts of the video. Fig. 7. SURF feature points found in the video after it is masked by the Activity Area. 63

4 Conclusions A variety of video processing algorithms has been applied to judicial videos containing characteristic gestures, in order to determine which ones are most appropriate for isolating the activity of interest. The optical flow is extracted as the moving points are of interest in this work. The active pixels are separated from the static ones using a kurtosis-based method. The SIFT algorithm is initially tested for feature as it is known to be robust, but is shown to provide too sparse noiseless feature points for an accurate depiction of the activity taking place. The Harris-Thomas detector and the SIFT algorithms, combined with a pyramidal version of the Lukas-Kanade algorithm, provide an accurate representation of the features of interest in the video. Finally, the SURF algorithm is examined, as it is one of the current state of the art methods for feature extraction, being robust to a wide range of transformations and computationally efficient. The results of applying SURF to the active pixel areas are very satisfactory and can be considered as reliable information for activity classification at latter stages. Future work includes examining additional feature detectors like the STAR detector, as well as performing feature point matching for the detectors examined. Acknowledgements The research leading to these results has received funding from the European Community s Seventh Framework Programme FP7/2007-2013 under grant agreement FP7-214306 - JUMAS. References 1. Horn B.K.P., Schunck B. G.: Determining Optical Flow. Artificial Intelligence 17, 185-203 (1981) 2. Lucas B., T. Kanade: An Iterative Image Registration Technique with an Application to Stereo Vision. In: Proc. of 7th International Joint Conference on Artificial Intelligence (IJCAI), pp. 674-679. (1981) 3. Bouguet J.: Pyramidal Implementation of the Lucas Kanade Feature Tracker. In: OpenCV distribution. (2000) 4. Lowe D. G.: Distinctive Image Features from Scale-Invariant Keypoints. International Journal of Computer Vision. 60, 91 110 (2004) 5. Smith, Bay H., Ess A., Tuytelaars T., Van Gool L.: SURF: Speeded Up Robust Features. Computer Vision and Image Understanding. 110, 346 359 (2008) 6. Harris C., Stephens M.: A combined corner and edge detector. In: Proceedings of the 4th Alvey Vision Conference, pp. 147 151. (1988) 7. Briassouli A., Kompatsiaris I.: Robust Temporal Activity Templates Using Higher Order Statistics. IEEE Transactions on Image Processing. To appear. 64