MCMOT: Multi-Class Multi-Object Tracking using Changing Point Detection

Size: px

Start display at page:

Download "MCMOT: Multi-Class Multi-Object Tracking using Changing Point Detection"

Gwenda Wood
6 years ago
Views:

Byungjae Lee¹, Songguo Jin¹, Enkhbayar Erdenee¹, Mi Young

1 MCMOT: Multi-Class Multi-Object Tracking using Changing Point Detection ILSVRC 2016 Object Detection from Video Byungjae Lee¹, Songguo Jin¹, Enkhbayar Erdenee¹, Mi Young Nam², Young Gui Jung², Phill Kyu Rhee¹ Inha University¹, NaeulTech²

2 Results with additional training data Object Detection from Video (VID) 2 nd place (map: 73.15%) Object Detection/Tracking from Video (VID) 2 nd place (map: 49.09%) 2

3 Overview I. Faster R-CNN Object Detector + Context region + Larger feature map + Ensemble + Data configuration II. MCMOT: Multi-Class Multi-Object Tracking Tracking by Detection Detection: Ensemble of CNNs Tracking: MCMOT using CPD S. Ren, K. He, R. Girshick, & J. Sun. Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks. TPAMI B. Lee, E. Erdenee, S. Jin, & P. Rhee. Multi-Class Multi-Object Tracking using Changing Point Detection. arxiv

4 I. Faster R-CNN Object Detector S. Ren, K. He, R. Girshick, & J. Sun. Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks. TPAMI

5 Object Detection from Video Challenge I: Small Object 5 * Figure from J. Li et al. Scale-aware Fast R-CNN for Pedestrian Detection. arxiv 2015.

6 Object Detection from Video Challenge I: Small Object Solution Larger feature map 6 * Figure from J. Li et al. Scale-aware Fast R-CNN for Pedestrian Detection. arxiv 2015.

7 Object Detection from Video Challenge II: Blurred Object 7 * Figure from Y. Zhu et al. segdeepm: Exploiting Segmentation and Context in Deep Neural Networks for Object Detection. CVPR 2015.

8 Object Detection from Video Challenge II: Blurred Object Solution Context Region 8 * Figure from Y. Zhu et al. segdeepm: Exploiting Segmentation and Context in Deep Neural Networks for Object Detection. CVPR 2015.

9 Network Architecture I. Faster R-CNN with VGG16 Conv1 Conv2 Conv3 Convolutional Network Region Proposal Network RoI Pooling layer FC FC Class layer Softmax class loss Conv4 Conv5 Bbox. reg. loss pool3 pool4 Bbox layer pool2 pool1 Karen Simonyan, & Andrew Zisserman. Very Deep Convolutional Networks for Large-Scale Image Recognition. arxiv S. Ren, K. He, R. Girshick, & J. Sun. Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks. TPAMI

10 Network Architecture I. Faster R-CNN with VGG16 Conv1 Conv2 Conv3 Convolutional Network Region Proposal Network RoI Pooling layer FC FC Class layer Softmax class loss Conv4 Conv5 Bbox. reg. loss pool3 pool4 Bbox layer pool2 pool1 Karen Simonyan, & Andrew Zisserman. Very Deep Convolutional Networks for Large-Scale Image Recognition. arxiv S. Ren, K. He, R. Girshick, & J. Sun. Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks. TPAMI

11 Network Architecture I. Faster R-CNN with VGG16 + Larger feature map (remove pool4 layer) (+2.9% map) Conv1 Conv2 Conv3 Convolutional Network Region Proposal Network RoI Pooling layer FC FC Class layer Softmax class loss Conv4ˈ Bbox. reg. loss Bbox layer pool3 pool2 pool1 J. Li, X. Liang, S. Shen, T. Xu, & S. Yan. Scale-aware Fast R-CNN for Pedestrian Detection. arxiv S. Ren, K. He, R. Girshick, & J. Sun. Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks. TPAMI

1.5 Concat layer Bbox. reg. loss pool1 Bbox layer RoI Pooling Context layer FC FC S. Gidaris, & N. Komodakis.

12 Network Architecture I. Faster R-CNN with VGG16 + Larger feature map (remove pool4 layer) (+2.9% map) + Context region (+2.6% map) Conv1 Conv2 Convolutional Network Region Proposal Network RoI Pooling layer FC FC Conv3 Conv4ˈ Class layer Softmax class loss pool2 pool3 Box * 1.5 Concat layer Bbox. reg. loss pool1 Bbox layer RoI Pooling Context layer FC FC S. Gidaris, & N. Komodakis. Object detection via a multi-region & semantic segmentation-aware CNN model. CVPR S. Ren, K. He, R. Girshick, & J. Sun. Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks. TPAMI

13 Training Data Configuration Overview of ILSVRC VID Dataset ILSVRC VID Training Validation Images Snippets * Images are from the ILSVRC VID dataset 13

14 Training Data Configuration Overview of ILSVRC VID Dataset ILSVRC VID Training Validation Images Snippets Redundant images within each snippet Diversity is too low to train CNN * Images are from the ILSVRC VID dataset 14

15 Training Data Configuration Overview of ILSVRC VID Dataset ILSVRC VID Training Validation Images Snippets Redundant images within each snippet Diversity is too low to train CNN We need more data * Images are from the ILSVRC VID dataset 15

16 Training Data Configuration ILSVRC DET ILSVRC VID Additional Data COCO DET ImageNet CLS Pre-trained Model 16

17 Training Data Configuration ILSVRC DET ILSVRC VID Additional Data COCO DET ImageNet CLS Pre-trained Model Fine-tune COCO DET Pre-trained Model 17

18 Training Data Configuration ILSVRC DET ILSVRC VID Additional Data COCO DET Sample 2K images per class (30 classes) Sample 2K images per class 100 images per class ImageNet CLS Pre-trained Model Fine-tune COCO DET Pre-trained Model 18

19 Training Data Configuration ILSVRC DET ILSVRC VID Additional Data COCO DET Sample 2K images per class (30 classes) Sample 2K images per class 100 images per class ImageNet CLS Pre-trained Model Fine-tune Training Dataset COCO DET Pre-trained Model 19

20 Training Data Configuration ILSVRC DET ILSVRC VID Additional Data COCO DET Sample 2K images per class (30 classes) Sample 2K images per class 100 images per class ImageNet CLS Pre-trained Model Fine-tune Training Dataset COCO DET Pre-trained Model Fine-tune Final Model 20

21 Detection Components VGG 16 map(%) Baseline 70.7% ResNet-101 map(%) Baseline 78.8% * We used 12 anchors for RPN * Performances are evaluated on VID minival (only used initial 30 frames per each snippet, total 16,524 images) 21

22 Detection Components VGG 16 map(%) Baseline 70.7% + Larger feature map 73.6% + Context layer 76.2% ResNet-101 map(%) Baseline 78.8% * We used 12 anchors for RPN * Performances are evaluated on VID minival (only used initial 30 frames per each snippet, total 16,524 images) 22

23 Detection Components VGG 16 map(%) Baseline 70.7% + Larger feature map 73.6% + Context layer 76.2% ResNet-101 map(%) Baseline 78.8% Faster R-CNN VGG16 Faster R-CNN ResNet Detection Results Combination MCMOT Tracking 82.3% map * We used 12 anchors for RPN * Performances are evaluated on VID minival (only used initial 30 frames per each snippet, total 16,524 images) 23

24 II. MCMOT: Multi-Class Multi-Object Tracking using Changing Point Detection B. Lee, E. Erdenee, S. Jin, & P. Rhee. Multi-Class Multi-Object Tracking using Changing Point Detection. arxiv

25 map(%) Motivation ILSVRC Object Detection Performance Year Object detector becomes robust Should we use complex multi-object tracking algorithm? 25

26 Motivation Based on high performance detection, simple & fast MOT algorithm can achieve competitive result 26

27 Results ILSVRC2016 Object Detection/Tracking from Video (VID) 2 nd place (map: 49.09%) with additional training data B. Lee, E. Erdenee, S. Jin, & P. Rhee. Multi-Class Multi-Object Tracking using Changing Point Detection. arxiv

28 MOTA(%) MOTA(%) Results ILSVRC2016 Object Detection/Tracking from Video (VID) 2 nd place (map: 49.09%) with additional training data Our MCMOT also achieves state-of-the-art results in different MOT datasets Multiple Object Tracking Benchmark 2016 KITTI Tracking Benchmark (Car) MCMOT NOMTwSDP16 KFILDAwSDP EAMTT AMPL 68 MCMOT RBPF DuEye NOMT CNN-OCC Method Method B. Lee, E. Erdenee, S. Jin, & P. Rhee. Multi-Class Multi-Object Tracking using Changing Point Detection. arxiv 2016.

29 Tracking by Detection Object Detection Sequence Multi-Object Tracking t Trajectory Object Detection: Ensemble of CNNs Multi-Object Tracking: MCMOT using CPD B. Lee, E. Erdenee, S. Jin, & P. Rhee. Multi-Class Multi-Object Tracking using Changing Point Detection. arxiv

30 MCMOT using CPD Video sequence (a) Likelihood Calculation (b) Track Segment Creation t t Change point Change point Change point Forward-backward Validation (c) Changing Point Detection t (d) Trajectory Combination MCMOT Trajectory Result B. Lee, E. Erdenee, S. Jin, & P. Rhee. Multi-Class Multi-Object Tracking using Changing Point Detection. arxiv

31 Track Segment Creation MCMC-based MOT approach Changing number of moving objects are challenging, which require high computation overheads due to a high-dimensional state space Separating motion dynamics The method separates the motion dynamic model of Bayesian filter into the entity transitions and motion moves No dimension variation in the iteration loop by separating the moves of birth and death H. Sakaino. Video-based tracking, learning, and recognition method for multiple moving objects. IEEE trans. On circuits and systems for video technology B. Lee, E. Erdenee, S. Jin, & P. Rhee. Multi-Class Multi-Object Tracking using Changing Point Detection. arxiv

32 Track Segment Creation Estimation of entity state transition (Birth, Death) The entity transitions are modeled as the birth and death events We estimate the entity prior by data-driven approach, instead of the inside of MCMC loop H. Sakaino. Video-based tracking, learning, and recognition method for multiple moving objects. IEEE trans. On circuits and systems for video technology B. Lee, E. Erdenee, S. Jin, & P. Rhee. Multi-Class Multi-Object Tracking using Changing Point Detection. arxiv

33 Track Segment Creation Separating motion dynamics Pros Since the Markov chain has no dimension variation in the iteration loop, it can reach to stationary states with less computation overhead Cons such a simple approach cannot deal with complex situations that occur in MOT Many of them are suffered from track drifts due to appearance variations Drift problem is attacked by a CPD algorithm H. Sakaino. Video-based tracking, learning, and recognition method for multiple moving objects. IEEE trans. On circuits and systems for video technology B. Lee, E. Erdenee, S. Jin, & P. Rhee. Multi-Class Multi-Object Tracking using Changing Point Detection. arxiv

34 Changing Point Detection Two-Stage Learning for Changing Point Detection Drifts in MCMOT are investigated by detection such abrupt change points between stationary time series that represent track segment A possible track drift is determined by a changing point detection 2 nd level time series is built using the scanned average responses to reduce outliers in the time series * Figure from J. Takeuchi, & K. Yamanishi. A Unifying Framework for Detecting Outliers and Change Points from Time Series. TKDE B. Lee, E. Erdenee, S. Jin, & P. Rhee. Multi-Class Multi-Object Tracking using Changing Point Detection. arxiv

35 Changing Point Detection Track segments Changing Point Detection Low CPD score High Forward-backward Validation Keep confident segments Low FB error High Exclude unstable segments Confident segments J. Takeuchi, & K. Yamanishi. A Unifying Framework for Detecting Outliers and Change Points from Time Series. TKDE B. Lee, E. Erdenee, S. Jin, & P. Rhee. Multi-Class Multi-Object Tracking using Changing Point Detection. arxiv

MOT16-02 #438 MOT16-02 #440 MOT16-02 #442

36 Changing Point Detection MOT16-09 #170 MOT16-09 #172 MOT16-09 #174 MOT16-07 #179 MOT16-02 #438 MOT16-02 #440 MOT16-02 #442 MOT16-02 #444 * Images are from MOT 2016 benchmark J. Takeuchi, & K. Yamanishi. A Unifying Framework for Detecting Outliers and Change Points from Time Series. TKDE B. Lee, E. Erdenee, S. Jin, & P. Rhee. Multi-Class Multi-Object Tracking using Changing Point Detection. arxiv

Tracking Speed Tracking performances comparison on the MOT benchmark 2016 The timing excludes detection time With a Titan X Maxwell GPU, the detector

37 Tracking Speed Tracking performances comparison on the MOT benchmark 2016 The timing excludes detection time With a Titan X Maxwell GPU, the detector runs at approximately 3.5 FPS B. Lee, E. Erdenee, S. Jin, & P. Rhee. Multi-Class Multi-Object Tracking using Changing Point Detection. arxiv

38 Results ImageNet VID MOT Benchmark KITTI Tracking Benchmark 38

39 Acknowledgement 39

40 Thank You

arxiv: v1 [cs.cv] 30 Aug 2016

arxiv: v1 [cs.cv] 30 Aug 2016 arxiv:1608.08434v1 [cs.cv] 30 Aug 2016 Multi-Class Multi-Object Tracking using Changing Point Detection Byungjae Lee 1, Enkhbayar Erdenee 1, Songguo Jin 1, and Phill Kyu Rhee 1 Inha University 1 Abstract.