with Deep Learning A Review of Person Re-identification Xi Li College of Computer Science, Zhejiang University

Similar documents
Datasets and Representation Learning

Deeply-Learned Part-Aligned Representations for Person Re-Identification

LARGE-SCALE PERSON RE-IDENTIFICATION AS RETRIEVAL

Pyramid Person Matching Network for Person Re-identification

arxiv: v1 [cs.cv] 24 Jul 2018

Multi-region Bilinear Convolutional Neural Networks for Person Re-Identification

arxiv: v2 [cs.cv] 8 May 2018

arxiv: v1 [cs.cv] 20 Dec 2017

arxiv: v1 [cs.cv] 25 Sep 2017

arxiv: v1 [cs.cv] 23 Nov 2017

Person Re-identification with Deep Similarity-Guided Graph Neural Network

Improving Face Recognition by Exploring Local Features with Visual Attention

arxiv: v1 [cs.cv] 7 Jul 2017

Deep View-Aware Metric Learning for Person Re-Identification

Efficient and Deep Person Re-Identification using Multi-Level Similarity

PERSON RE-IDENTIFICATION USING VISUAL ATTENTION. Alireza Rahimpour, Liu Liu, Ali Taalimi, Yang Song, Hairong Qi

Cascade Region Regression for Robust Object Detection

Person Transfer GAN to Bridge Domain Gap for Person Re-Identification

Efficient Segmentation-Aided Text Detection For Intelligent Robots

Xiaowei Hu* Lei Zhu* Chi-Wing Fu Jing Qin Pheng-Ann Heng

arxiv: v1 [cs.cv] 19 Nov 2018

Classifying a specific image region using convolutional nets with an ROI mask as input

JOINT DETECTION AND SEGMENTATION WITH DEEP HIERARCHICAL NETWORKS. Zhao Chen Machine Learning Intern, NVIDIA

arxiv: v1 [cs.cv] 29 Dec 2018

CNN for Low Level Image Processing. Huanjing Yue

Deep Spatial-Temporal Fusion Network for Video-Based Person Re-Identification

A Pose-Sensitive Embedding for Person Re-Identification with Expanded Cross Neighborhood Re-Ranking

PERSON RE-IDENTIFICATION USING VISUAL ATTENTION. Alireza Rahimpour, Liu Liu, Yang Song, Hairong Qi

arxiv: v1 [cs.cv] 19 Oct 2016

arxiv: v1 [cs.cv] 19 Nov 2018

Part-Aligned Bilinear Representations for Person Re-identification

arxiv: v1 [cs.cv] 21 Jul 2018

arxiv: v3 [cs.cv] 20 Apr 2018

Geometry-aware Traffic Flow Analysis by Detection and Tracking

Learning Deep Structured Models for Semantic Segmentation. Guosheng Lin

3 Object Detection. BVM 2018 Tutorial: Advanced Deep Learning Methods. Paul F. Jaeger, Division of Medical Image Computing

Supplementary Materials for Salient Object Detection: A

Deep Neural Networks with Inexact Matching for Person Re-Identification

SSD: Single Shot MultiBox Detector. Author: Wei Liu et al. Presenter: Siyu Jiang

arxiv: v2 [cs.cv] 21 Dec 2017

[Supplementary Material] Improving Occlusion and Hard Negative Handling for Single-Stage Pedestrian Detectors

The Devil is in the Middle: Exploiting Mid-level Representations for Cross-Domain Instance Matching

Yiqi Yan. May 10, 2017

Photo OCR ( )

Stanford University Packing and Padding, Coupled Multi-index for Accurate Image Retrieval.

Human Pose Estimation using Global and Local Normalization. Ke Sun, Cuiling Lan, Junliang Xing, Wenjun Zeng, Dong Liu, Jingdong Wang

Robust Face Recognition Based on Convolutional Neural Network

Graph Correspondence Transfer for Person Re-identification

Instance-aware Semantic Segmentation via Multi-task Network Cascades

arxiv: v3 [cs.cv] 4 Sep 2018

arxiv: v1 [cs.cv] 29 Sep 2016

BIOMETRIC recognition, especially face recognition and

Mask R-CNN. presented by Jiageng Zhang, Jingyao Zhan, Yunhan Ma

arxiv: v1 [cs.cv] 30 Jul 2018

Face Recognition A Deep Learning Approach

Multi-Level Factorisation Net for Person Re-Identification

Detecting and Parsing of Visual Objects: Humans and Animals. Alan Yuille (UCLA)

Deep learning for object detection. Slides from Svetlana Lazebnik and many others

PERSON re-identification [1] is an important task in video

Lecture 7: Semantic Segmentation

Person Re-identification for Improved Multi-person Multi-camera Tracking by Continuous Entity Association

arxiv: v2 [cs.cv] 26 Sep 2016

arxiv: v1 [cs.cv] 31 Mar 2017

MULTI-VIEW FEATURE FUSION NETWORK FOR VEHICLE RE- IDENTIFICATION

PErson re-identification (re-id) is usually viewed as an

Flow-Based Video Recognition

Localization Guided Learning for Pedestrian Attribute Recognition

P-CNN: Pose-based CNN Features for Action Recognition. Iman Rezazadeh

Show, Discriminate, and Tell: A Discriminatory Image Captioning Model with Deep Neural Networks

Hierarchical Discriminative Learning for Visible Thermal Person Re-Identification

Video Gesture Recognition with RGB-D-S Data Based on 3D Convolutional Networks

arxiv: v1 [cs.cv] 11 Aug 2017

Discovering Disentangled Representations with the F Statistic Loss

CS6501: Deep Learning for Visual Recognition. Object Detection I: RCNN, Fast-RCNN, Faster-RCNN

Regionlet Object Detector with Hand-crafted and CNN Feature

Introduction to Deep Learning for Facial Understanding Part III: Regional CNNs

End-to-End Deep Kronecker-Product Matching for Person Re-identification

Object Detection Based on Deep Learning

Pre-print version. arxiv: v3 [cs.cv] 19 Jul Survey on Deep Learning Techniques for Person Re-Identification Task.

arxiv: v1 [cs.cv] 20 Mar 2018

Neural Person Search Machines

arxiv: v1 [cs.cv] 16 Nov 2015

An efficient face recognition algorithm based on multi-kernel regularization learning

Human Pose Estimation with Deep Learning. Wei Yang

Direct Multi-Scale Dual-Stream Network for Pedestrian Detection Sang-Il Jung and Ki-Sang Hong Image Information Processing Lab.

Mask R-CNN. Kaiming He, Georgia, Gkioxari, Piotr Dollar, Ross Girshick Presenters: Xiaokang Wang, Mengyao Shi Feb. 13, 2018

Deep Spatial Pyramid Ensemble for Cultural Event Recognition

ABC-CNN: Attention Based CNN for Visual Question Answering

Spatial Localization and Detection. Lecture 8-1

Proceedings of the International MultiConference of Engineers and Computer Scientists 2018 Vol I IMECS 2018, March 14-16, 2018, Hong Kong

arxiv: v2 [cs.cv] 18 Jul 2017

arxiv: v2 [cs.cv] 17 Apr 2018

arxiv: v1 [cs.cv] 2 Sep 2018

Universiteit Leiden Computer Science

Supplementary Material: Pixelwise Instance Segmentation with a Dynamically Instantiated Network

Deep Spatial Feature Reconstruction for Partial Person Re-identification: Alignment-free Approach

Fully Convolutional Networks for Semantic Segmentation

Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks

Embedding Deep Metric for Person Re-identification: A Study Against Large Variations

Domain Adaptation from Synthesis to Reality in Single-model Detector for Video Smoke Detection

Transcription:

A Review of Person Re-identification with Deep Learning Xi Li College of Computer Science, Zhejiang University http://mypage.zju.edu.cn/xilics Email: xilizju@zju.edu.cn

Person Re-identification Associate the person images across different cameras. query gallery

General Solutions Step 1: Feature Extraction Extracting features for every person images Step 2: Feature Matching Matching features to calculate the similarity score From 2014, deep models were used to improve these two parts Feature Extractor (e.g. CNNs) Image Feature Person Image Person Image Feature Extractor Feature Extractor Feature Feature Feature Matching Feature Extraction Feature Matching

Feature Matching Methods Matching based on pre-defined locations Global, local stripes, grid patches Matching based on semantic regions Person parts, salient regions, attention regions semantic Global Segmentation Stripes Grid Part Attention

Deep Learning Based Methods different matching or partitioning strategies Stripes Grids Attention Pose

Deep Learning Based Methods different matching or partitioning strategies Stripes Grids Attention Pose

Stripes Based Matching: DeepMetric (2014) Dividing person image into 3 horizontal stripes Extracting CNN features from a pair of images Combining features within each stripe Computing similarity scores with fused features Dong Yi, Zhen Lei, and Stan Z. Li. Deep metric learning for practical person re-identification. ICPR, 2014.

Wei Li, Rui Zhao, Tong Xiao, and Xiaogang Wang. DeepReID: Deep filter pairing neural network for person re-identification. In CVPR, 2014. Stripes Based Matching: DeepReID (2014) Dividing person image into horizontal stripes Extracting CNN features from a pair of images Patch matching within each stripe Rank1 DeepReID CUHK03 20.7% Patch matching: 1. Suppose each stripe has 4 patches 2. Match a pair of feature within one stripe 3. Get 4x4 response map for one channel 4. First channel detects blue, another green

Wei Li, Rui Zhao, Tong Xiao, and Xiaogang Wang. AlignedReID: Surpassing Human-Level Performance in Person Re-Identification. ArXiv, 2017. Stripes Based Matching: AlignedReID (2017) Combing local feature and global feature Local using pre-defined stripes but dynamic matching Triplet loss by finding the shortest matching path.

Deep Learning Based Methods different matching or partitioning strategies Stripes Grids Attention Pose

Grid Patches Based: IDLA (2015) DeepReID IDLA Compute differences between each pixel and its 5x5 neighborhood pixels Concatenating the differences for similarity learning CUHK03 20.7% 54.7% [1] IDLA (2015) [2] PersonNet (2016) (deeper CNN) [1] E. Ahmed, M. Jones, and T. K. Marks. An improved deep learning architecture for person re-identification. In CVPR, 2015. [2] L. Wu, C. Shen, and A. van den Hengel. Personnet: Person re-identification with deep convolutional neural networks. CoRR, abs/1601.07255, 2016.

Challenge of Pre-defined Matching Spatial location misalignment due to detection or pose changes CNN based online feature Image 1 Feature Maps Matching Network Image 2 Feature Maps

Yaqing Zhang, Xi Li, Liming Zhao, Zhongfei Zhang. Semantics-Aware Deep Correspondence Structure Learning for Robust Person Re-identification. IJCAI, 2016. Online Feature Matching: DCSL (2016) Deep Correspondence Learning instead of manually defining the matching grid patches Adaptively learn a hierarchical data-driven feature matching function IDLA DCSL CUHK03 54.7% 80.2% architecture pyramid matching

Deep Learning Based Methods different matching or partitioning strategies Stripes Grids Attention Pose

Liming Zhao, Xi Li, Yueting Zhuang, and Jingdong Wang. Deeply-Learned Part-Aligned Representations for Person Re-Identification. ICCV, 2017. Part Regions Learning: Part-Aligned (2017) Learn the key regions for Embedding instead of Matching Align the feature with the learnt region maps Extracting features and then calculating Euclidean distances C=512 Feature Maps 512 K -d 512-d Rank1 on CUHK03: 85.4% Rank1 on Market: 81.0% DCSL CNN Part Net Aligned Feature DLPAR CUHK03 80.2% 85.4% subnetworks features

Part Regions Learning: Part-Aligned (2017) An end-to-end solution to jointly Learn the reid-sensitive maps for person matching Learn the part-aligned deep representation CNN CNN proposed module proposed module x + x Triplet Loss Feature Maps Part Maps Detector Learned masks Weighted Features CNN proposed module x Conv Layers Masking Framework Proposed module

Part Regions Learning: Part-Aligned (2017) Learn the maps without extra annotations. ReID-sensitive regions Different with traditional part segmentation

Attention Regions Learning: HydraPlus-Net (2017) Learn features by fusing three attention module with Softmax loss Learn attention map from different scale for each module Apply the attention map on different layers of the network DLPAR HPNet CUHK03 85.4% 91.8% Market 81.0% 76.9% Framework Attention module F(α 2 ) Xihui Liu, Haiyu Zhao, Maoqing Tian, Lu Sheng, Jing Shao, Shuai Yi, Junjie Yan, Xiaogang Wang: HydraPlus-Net: Attentive Deep Features for Pedestrian Analysis. ICCV 2017

Attention Regions Learning: HydraPlus-Net (2017) DLPAR HPNet (a) Attention maps in three different levels or scales CUHK03 85.4% 91.8% Market 81.0% 76.9% (b,c ) Each level maps contain 8 channels (b,c ) One channel learn one part or things (bags) Xihui Liu, Haiyu Zhao, Maoqing Tian, Lu Sheng, Jing Shao, Shuai Yi, Junjie Yan, Xiaogang Wang: HydraPlus-Net: Attentive Deep Features for Pedestrian Analysis. ICCV 2017

Attention Regions Learning: HA-CNN (2018) One global feature extraction branch Several local feature extraction branches (3 branches illustrated in figure) Harmonious Attention: learn a set of complementary attention maps hard (regional) attention for the local branch soft (pixel-level and scale-level) attention for the global branch HPNet HA-CNN CUHK03 91.8% - Market 76.9% 91.2% Wei Li, Xiatian Zhu, and Shaogang Gong: Harmonious Attention Network for Person Re-Identification. CVPR 2018

Deep Learning Based Methods different matching or partitioning strategies Stripes Grids Attention Pose

Pose-driven Embedding: PDC (2017) To better alleviate the challenges from pose variations Propose a FEN to learn and normalize human parts Estimate 14 joints using a separated pose estimator (FCN). Merge 14 joints to 6 parts and normalize to pre-defined locations Generate a transformed and modified part image HPNet PDC CUHK03 91.8% 88.7% Market 76.9% 84.1% Feature Embedding sub-net (FEN) Feature Embedding sub-net (FEN) Chi Su, Jianing Li, Shiliang Zhang, Junliang Xing, Wen Gao, and Qi Tian: Pose-driven Deep Convolutional Model for Person Re-identification. ICCV 2017

Pose-driven Embedding: PDC (2017) Global feature learnt from original image with Softmax loss Part feature learnt from modified part image with Softmax loss Fusing global and part features with a sub-net HPNet PDC CUHK03 91.8% 88.7% Market 76.9% 84.1% Framework Feature Weighting sub-net Chi Su, Jianing Li, Shiliang Zhang, Junliang Xing, Wen Gao, and Qi Tian: Pose-driven Deep Convolutional Model for Person Re-identification. ICCV 2017

Pose-driven Embedding: PSE (2018) Separately trained View Predictor and off-the-shelf pose estimator Combine pose maps and RGB images as input Using view predictions to select one of three CNN units Embedding the pose and view information by simply training the CNN PDC PSE CUHK03 88.7% - Market 84.1% 90.3% M. Saquib Sarfraz, Arne Schumann, Andreas Eberle, Rainer Stiefelhagen: A Pose-Sensitive Embedding for Person Re-Identification with Expanded Cross Neighborhood Re-Ranking. CVPR 2018

Performances Methods Publication Types CUHK03 Market -1501 DeepReID CVPR 2014 Stripes + Matching 20.7 - IDLA CVPR 2015 Grid Patches + Matching 54.7 - PersonNet ArXiv 2016 Grid Patches + Matching 64.8 37.2 DCSL IJCAI 2016 Structure Learning+ Matching 80.2 - HydraPlus-Net ICCV 2017 Attention + Embedding 91.8 76.9 Part-Aligned ICCV 2017 Attention + Embedding 85.4 81.0 PDC ICCV 2017 Pose + Embedding 88.7 84.1 PSE CVPR 2018 Pose + Embedding - 90.3 HA-CNN CVPR 2018 Attention + Embedding - 91.2 AlignedReID ArXiv 2017 Stripes Association + Embedding 92.4 91.8

Conclusion Person ReID with deep learning Extracting feature maps using CNN Matching features by comparing on different locations Matching on pre-defined spatial locations Matching with learnt semantic regions Stripes Grid Patches Learn correspondence implicitly in the network Learn key part regions for matching Learn attention regions for feature embedding Using off-the-shelf pose/view estimator for feature embedding

Acknowledgement Dr. Liming Zhao Dr. Yaqing Zhang