Person Re-identification for Improved Multi-person Multi-camera Tracking by Continuous Entity Association

Similar documents
Re-identification for Online Person Tracking by Modeling Space-Time Continuum

Performance Measures and the DukeMTMC Benchmark for Multi-Target Multi-Camera Tracking. Ergys Ristani

Human Upper Body Pose Estimation in Static Images

3D Spatial Layout Propagation in a Video Sequence

Automatic Gait Recognition. - Karthik Sridharan

Learning the Three Factors of a Non-overlapping Multi-camera Network Topology

QMUL-ACTIVA: Person Runs detection for the TRECVID Surveillance Event Detection task

Automatic Tracking of Moving Objects in Video for Surveillance Applications

Hide-and-Seek: Forcing a network to be Meticulous for Weakly-supervised Object and Action Localization

Bidirectional Recurrent Convolutional Networks for Video Super-Resolution

Using temporal seeding to constrain the disparity search range in stereo matching

Internet of things that video

with Deep Learning A Review of Person Re-identification Xi Li College of Computer Science, Zhejiang University

Latent Variable Models for Structured Prediction and Content-Based Retrieval

Performance Evaluation Metrics and Statistics for Positional Tracker Evaluation

Expanding gait identification methods from straight to curved trajectories

LEARNING A SPARSE DICTIONARY OF VIDEO STRUCTURE FOR ACTIVITY MODELING. Nandita M. Nayak, Amit K. Roy-Chowdhury. University of California, Riverside

Predicting Depth, Surface Normals and Semantic Labels with a Common Multi-Scale Convolutional Architecture David Eigen, Rob Fergus

Video Alignment. Literature Survey. Spring 2005 Prof. Brian Evans Multidimensional Digital Signal Processing Project The University of Texas at Austin

Video Alignment. Final Report. Spring 2005 Prof. Brian Evans Multidimensional Digital Signal Processing Project The University of Texas at Austin

LSTM and its variants for visual recognition. Xiaodan Liang Sun Yat-sen University

Synscapes A photorealistic syntehtic dataset for street scene parsing Jonas Unger Department of Science and Technology Linköpings Universitet.

Efficient Segmentation-Aided Text Detection For Intelligent Robots

Time Analysis of Pulse-based Face Anti-Spoofing in Visible and NIR

Large-Scale Traffic Sign Recognition based on Local Features and Color Segmentation

Partial Face Matching between Near Infrared and Visual Images in MBGC Portal Challenge

Track-clustering Error Evaluation for Track-based Multi-Camera Tracking System Employing Human Re-identification

From Structure-from-Motion Point Clouds to Fast Location Recognition

Adaptive Action Detection

Pull the Plug? Predicting If Computers or Humans Should Segment Images Supplementary Material

Salient Region Detection and Segmentation in Images using Dynamic Mode Decomposition

Human Pose Estimation with Deep Learning. Wei Yang

VISION & LANGUAGE From Captions to Visual Concepts and Back

Optical flow and tracking

DeepIM: Deep Iterative Matching for 6D Pose Estimation - Supplementary Material

Flow-Based Video Recognition

Automatic visual recognition for metro surveillance

Topological Mapping. Discrete Bayes Filter

Joint Inference in Image Databases via Dense Correspondence. Michael Rubinstein MIT CSAIL (while interning at Microsoft Research)

Object Purpose Based Grasping

Definition, Detection, and Evaluation of Meeting Events in Airport Surveillance Videos

Class 3: Advanced Moving Object Detection and Alert Detection Feb. 18, 2008

Object Detection by 3D Aspectlets and Occlusion Reasoning

Chapter 9 Object Tracking an Overview

Three-Dimensional Object Detection and Layout Prediction using Clouds of Oriented Gradients

A GENERIC FACE REPRESENTATION APPROACH FOR LOCAL APPEARANCE BASED FACE VERIFICATION

LARGE-SCALE PERSON RE-IDENTIFICATION AS RETRIEVAL

Depth from Stereo. Dominic Cheng February 7, 2018

Automatic Shadow Removal by Illuminance in HSV Color Space

Improving Face Recognition by Exploring Local Features with Visual Attention

The Kinect Sensor. Luís Carriço FCUL 2014/15

Learning from 3D Data

Object Detection Based on Deep Learning

A Comparison of CNN-based Face and Head Detectors for Real-Time Video Surveillance Applications

Continuous Multi-Views Tracking using Tensor Voting

Improving Vision-based Topological Localization by Combining Local and Global Image Features

Projected Texture for Hand Geometry based Authentication

Abstract. 1 Introduction. 2 Motivation. Information and Communication Engineering October 29th 2010

Color Feature Based Object Localization In Real Time Implementation

Lecture 18: Human Motion Recognition

Detecting motion by means of 2D and 3D information

Multilayer and Multimodal Fusion of Deep Neural Networks for Video Classification

Deep Character-Level Click-Through Rate Prediction for Sponsored Search

Combining Appearance and Topology for Wide

Deep Tracking: Biologically Inspired Tracking with Deep Convolutional Networks

Multimodal Medical Image Retrieval based on Latent Topic Modeling

FAST: A Framework to Accelerate Super- Resolution Processing on Compressed Videos

Research Faculty Summit Systems Fueling future disruptions

Spatio-temporal Feature Classifier

Face Matching between Near Infrared and Visible Light Images

Cascade Region Regression for Robust Object Detection

Selection of Scale-Invariant Parts for Object Class Recognition

Online Tracking Parameter Adaptation based on Evaluation

Learning to Track at 100 FPS with Deep Regression Networks - Supplementary Material

ECCV Presented by: Boris Ivanovic and Yolanda Wang CS 331B - November 16, 2016

Beyond Bags of Features

Segmentation and Tracking of Partial Planar Templates

Robust False Positive Detection for Real-Time Multi-Target Tracking

Summarization of Egocentric Moving Videos for Generating Walking Route Guidance

AUTOMATIC VISUAL CONCEPT DETECTION IN VIDEOS

A multiple hypothesis tracking method with fragmentation handling

arxiv: v1 [cs.cv] 20 Sep 2017

08 An Introduction to Dense Continuous Robotic Mapping

Exploring Similarity Measures for Biometric Databases

Multiple camera fusion based on DSmT for tracking objects on ground plane

Gradient of the lower bound

Classification of objects from Video Data (Group 30)

Modeling and Recognition of Landmark Image Collections Using Iconic Scene Graphs

COSC160: Detection and Classification. Jeremy Bolton, PhD Assistant Teaching Professor

MCMOT: Multi-Class Multi-Object Tracking using Changing Point Detection

Deep Learning in Visual Recognition. Thanks Da Zhang for the slides

Saliency Detection for Videos Using 3D FFT Local Spectra

Real-time target tracking using a Pan and Tilt platform

Class 5: Attributes and Semantic Features

Person Detection Using Image Covariance Descriptor

Supplementary Material for Zoom and Learn: Generalizing Deep Stereo Matching to Novel Domains

(Deep) Learning for Robot Perception and Navigation. Wolfram Burgard

Multi-region Bilinear Convolutional Neural Networks for Person Re-Identification

Liangjie Hong*, Dawei Yin*, Jian Guo, Brian D. Davison*

Learning video saliency from human gaze using candidate selection

Transcription:

Person Re-identification for Improved Multi-person Multi-camera Tracking by Continuous Entity Association Neeti Narayan, Nishant Sankaran, Devansh Arpit, Karthik Dantu, Srirangaraj Setlur, Venu Govindaraju University at Buffalo

Overview Introduction - Person analysis (Re-ID, MPMCT), Challenges Related Work - Existing methods Motivation and Goals Proposed Approach - Continuous entity association for person tracking Future Work - Spatio-temporal based tracking approach incorporating visual appearance and location together

Introduction 3

Automated Analysis of Large Video Data Video surveillance Activity and behavior characterization Increase in number of deployed cameras 4 Increase in the workload of video operators Decrease in efficiency Growing demand of automated analysis and understanding video content Key person analysis tasks: Person recognition, verification and tracking

Person Re-identification (ReID) Target re-identification retrieves all and only the gallery images of the same target as the query. An end-to-end person ReID system that includes person detection and re-identification 5

Multi-Person Multi-Camera Tracking (MPMCT) 6

Related Work 7

Person Re-identification Siamese deep neural network for person re-identification + + - 8 Learns similarity metric from image pixels directly. People need not be enrolled. No motion modeling. Real-world scenarios not modeled. (Yi, Dong, Zhen Lei, and Stan Z. Li. Deep metric learning for practical person re-identification, 2014)

Person Re-identification An improved deep learning architecture for person re-identification + + - Addresses re-identification problem Does not require persons to be enrolled Additional modalities like face, gait etc are not utilized No motion modeling / location based association Does not handle real world scenarios of needing to associate multiple simultaneous observations 9 (Ahmed, Ejaz, Michael Jones, and Tim K. Marks. An improved deep learning architecture for person re-identification. Proceeding of the IEEE CVPR 2015)

Person Tracking Harry Potter s Marauder s Map: Localizing and Tracking Multiple Persons-of-Interest by Nonnegative Discretization + + + - 10 Person localization and tracking. Complex indoor scenarios. Uses color, person detection & non-background info. No spatial locality constraint enforced (person at multiple places simultaneously). Persons to be tracked must be enrolled previously. (Yu, Shoou-I., Yi Yang, and Alexander Hauptmann. "Harry potter's marauder's map: Localizing and tracking multiple persons-of-interest by nonnegative discretization." Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2013.)

Motivation and Goals 11

Motivation 12 Target ReID and MPMCT are clearly different. But, they share several common aspects as well. Assume semantic notion of identity. Some components of the solution to one problem can be used to solve the other. ReID involves associating object hypotheses, hence, possible to draw some parallels to tracking as well. Tracking failures can be effectively recovered by learning from historical visual semantics and tracking associations.

Goals Automated analysis of video data Do not rely on constant human interaction. Within the context of data association, introduce a learning perspective to person tracking Address influence of human appearance, face biometric and location transition on person re-identification. Minimal assumptions Do not assume enrollment of people. High tracking accuracy Do not compromise on tracking accuracy. Track all people in the scene very efficiently with minimum identity switches. Design metrics to quantify the tracking system. 13

Proposed Approach 14

Analysis of People in Public Spaces A model for multi-camera tracking Continuous entity association Between current and previous timestamp detections Steps in learning detection associations and tracking people: 15 Person detection Feature extraction based on human appearance, biometric and location constraints Association probability matrix Most probable associations - Linear programming problem

Flowchart 16

Feature Extraction Desirable Properties Robust to inherent variations. Good discriminative ability. Features Explored Appearance features Face features Feature length = 4096; VGG-16 feature Location transition 17 Feature length = 4096; AlexNet feature Feature length = 9 x 9 x num of cameras.

Appearance Features Input: Person BB Output: Appearance-based features from last FC layer AlexNet 18

Face Features Input: Face BB Output: Face features from last FC layer VGG-16 19

Transition Probability Predict most probable paths within and across cameras 20

Inference Algorithm maxwp. W s.t. W [0,1], W1 = 1, 1TW = 1 Greedy approach of selecting largest probability sequentially 21

Database and Protocols CamNeT 8 cameras covering indoor and outdoor scenes at a university, more than 16,000 images of 50 people. 640x480 images, @20-30fps Protocol 22 Use Scenario 1. IDs not unique - manual tagging performed. Upto 6-7 simultaneous observations. Zhang, Shu, et al. "A camera network tracking (CamNeT) dataset and performance baseline." 2015 IEEE Winter Conference on Applications of Computer Vision. IEEE, 2015.

Database and Protocols Erroneous tracking ground truth 23 Incorrect ID tagging

Database and Protocols DukeMTMC 8 cameras, more than 2M frames of 2,700 people. 1920x1080 images, @60fps Protocol Use only training data for experiments. Use camera1 and camera3 video data for attribute feature evaluation. Performance Measures and a Data Set for Multi-Target, Multi-Camera Tracking. E. Ristani, F. Solera, R. S. Zou, R. Cucchiara and C. Tomasi. ECCV 2016 Workshop on Benchmarking Multi-Target Tracking. 24

Evaluation Metric Use ROC curves and the area under the curve (AUC) for evaluating data association results. Use a continuous re-identification evaluation metric for person tracking: 25

Results Evaluated performances of individual features for tracking achieving AUC scores of: Face features 96.56% Attribute features 99.37% Location transition 98.28% CamNeT 26 Appearance features performed well even at low FAR. The performance of face features deteriorated because of low resolution.

Results Evaluated performances of individual features for tracking achieving AUC scores of: DukeMTMC 27 Face features 92.07 Attribute features 99.99% Location transition 98.73%

Inference Results Inference error rates using proposed entity association algorithm CamNeT: Face features 4.67% Attribute features 2.9% Location transition 4.49% DukeMTMC: 28 Face features 12.07% Attribute features 0.01% Location transition 0.5%

Comparison CamNet dataset: Crossing fragments (XFrag): The number of true associations missed by the tracking system. Method XFrag Baseline results [1] 27 Method in [2] 24 Ours 5 [1] Shu Zhang, Elliot Staudt, Tim Faltemier, and Amit K Roy-Chowdhury. A camera network tracking (camnet) dataset and performance baseline. In WACV, 2015 [2] Bi Song and Amit K Roy-Chowdhury. Robust tracking in a camera network: A multi-objective optimization framework. IEEE Journal of Selected Topics in Signal Processing, 2008 29

Comparison DukeMTMC dataset: Fragmentation: The number of identity switches in the tracking result, when the corresponding ground-truth identity does not change. Method Cam1 Cam2 Cam3 Cam4 Cam5 Cam6 Cam7 Cam8 Baseline results [1] 366 1929 336 403 292 3370 675 365 Ours 34 47 102 42 69 84 139 12 [1] Ergys Ristani, Francesco Solera, Roger Zou, Rita Cucchiara, and Carlo Tomasi. Performance measures and a data set for multi-target, multi-camera tracking. In European Conference on Computer Vision. Springer, 2016 30

Contribution Algorithm Within the context of data association, we introduce a learning perspective to the tracking problem. Does not require temporally contiguous sequence of video data. Minimal assumptions Applications The framework can be extended to a variety of data types: Multimodal biometrics Person wardrobe model - Clothing Impact 31 Pave the way towards future research in this direction. Encourage incorporating other constraints like speed, travel time etc.

Future Work Develop a learning model to recover from association errors. Minimize association errors in Entry-Exit case.

Thank You!