Online Object Tracking with Proposal Selection. Yang Hua Karteek Alahari Cordelia Schmid Inria Grenoble Rhône-Alpes, France

Size: px

Start display at page:

Download "Online Object Tracking with Proposal Selection. Yang Hua Karteek Alahari Cordelia Schmid Inria Grenoble Rhône-Alpes, France"

Blanche Richardson
5 years ago
Views:

1 Online Object Tracking with Proposal Selection Yang Hua Karteek Alahari Cordelia Schmid Inria Grenoble Rhône-Alpes, France

2 Outline 2 Background Our approach Experimental results Summary

3 Background 3 Tracking-by-detection paradigm has been extremely successful on diverse benchmarks [Wu et al., 2013] [Kristan et al., 2013/14] [Smeulders et al., 2014] Train initial model on frame 1 T = 2 N Do detection on frame T Select tracking result with best detection score Update model for frame T+1 Y. Wu, J. Lim, and M.-H. Yang. Online object tracking: A benchmark. In CVPR, A. W. M. Smeulders et al. Visual tracking: an experimental survey. PAMI, 2014 M. Kristan et al. The visual object tracking VOT2013/2014 challenge results. In ICCV/ECCV VOT Challenge Workshop, 2013/2014.

4 Background 4 Successful tracking-by-detection methods Struck [Hare et al., 2011] PLT 13/14 [Heng et al., 2012] DSST [Danelljan et al., 2014] S. Hare, A. Saffari, and P. H. S. Torr. Struck: Structured output tracking with kernels. In ICCV, C.K. Heng, Y. Sumio, M. Yuichi, T. Hajime, Shrink boost for selecting multi-lbp histogram features in object detection. In CVPR, 2012 M. Danelljan, G. Hager, F. Shahbaz Khan, and M. Felsberg. Accurate scale estimation for robust visual tracking. In BMVC, 2014.

5 Background 5 Two key ingredients Discriminative learning

6 Background 6 Two key ingredients Discriminative learning Similarity Dissimilarity

7 Background 7 Two key ingredients Discriminative learning Pixel-accurate localization IoU = 0.9 IoU = 0.7 IoU = 0.5

8 Background 8 Two key ingredients Discriminative learning Pixel-accurate localization Train initial model on frame 1 T = 2 N Do detection on frame T Select tracking result with best detection score Update model for frame T+1

9 Background 9 Two key ingredients Discriminative learning Pixel-accurate localization Train initial discriminative model on frame 1 T = 2 N Do detection with pixel-accurate localization on frame T Select tracking result with best detection score Update discriminative model for frame T+1

10 Background 10 Limitations of tracking-by-detection approaches Can not handle challenging conditions where an object undergoes transformations, e.g., severe rotation DSST Ours Groundtruth Frame 1 Frame 10 Frame 30

11 Background 11 Limitations of tracking-by-detection approaches Can not handle challenging conditions where an object undergoes transformations, e.g., severe rotation Select tracking results based on detection score only

12 Our approach: Overview 12 Proposal Selection Tracker [Hua et al., 2015] Train initial discriminative model on frame 1 T = 2 N Do detection with pixel-accurate localization on frame T Select tracking result with best detection score Update discriminative model for frame T+1 Y. Hua, K. Alahari and C. Schmid. Online Object Tracking with Proposal Selection. In ICCV, 2015

13 Our approach: Overview 13 Proposal Selection Tracker [Hua et al., 2015] T = 2 N Train initial discriminative model on frame 1 Propose regular detection candidates on frame T Propose geometry transformed candidates on frame T Select tracking result with three cues Update discriminative model for frame T+1 Y. Hua, K. Alahari and C. Schmid. Online Object Tracking with Proposal Selection. In ICCV, 2015

a Hough transform voting scheme Detection proposal Geometry proposal Ground truth T. Brox and J.

14 Our approach: Proposal Selection Tracker 14 Geometry proposal Compute frame-to-frame pixel correspondences with optical flow [Brox and Malik, 2011] Estimate similarity transformations with a Hough transform voting scheme Detection proposal Geometry proposal Ground truth T. Brox and J. Malik. Large displacement optical flow: Descriptor matching in variational motion estimation. PAMI, 2011.

15 Our approach: Proposal Selection Tracker 15 Multiple cues for selection Detection scores Edgebox score [Zitnick and Dollár, 2014], originally from edge response [Dollár and Zitnick, 2013] High edgebox score Low edgebox score C. L. Zitnick and P. Dollár. Edge boxes: Locating object proposals from edges. In ECCV, P. Dollár and C. L. Zitnick. Structured forests for fast edge detection. In ICCV, 2013.

16 Our approach: Proposal Selection Tracker 16 Multiple cues for selection Edgebox scores from edge responses and motion boundaries [Weinzaepfel et al., 2015] are complementary Edge Responses Motion boundaries P. Weinzaepfel, J. Revaud, Z. Harchaoui, and C. Schmid. Learning to detect motion boundaries. In CVPR, 2015.

17 Our approach: Proposal Selection Tracker 17 How to combine multiple cues? When detection scores of the top candidates are very similar, we select the one with the best edgebox measure Low edgebox score, but high detection score High edgebox score, but low detection score

18 Our approach: Overview 18 Proposal Selection Tracker Our submission Train initial discriminative model on frame 1 T = 2 N Propose regular detection candidates on frame T Propose geometry transformed candidates on frame T Select tracking result with three cues Update discriminative model for frame T+1

19 Our approach: Overview 19 Proposal Selection Tracker Our submission A simplified Proposal Selection Tracker without optical flow calculation T = 2 N Train initial discriminative model on frame 1 Propose regular detection candidates on frame T Propose non-axis aligned detection candidates on frame T Select tracking result with two cues Update discriminative model for frame T+1

20 Our approach: Overview 20 Proposal Selection Tracker Our submission A simplified Proposal Selection Tracker without optical flow calculation Same parameters for both VOT-TIR2015 and VOT2015 challenges T = 2 N Train initial discriminative model on frame 1 Propose regular detection candidates on frame T Propose non-axis aligned detection candidates on frame T Select tracking result with two cues Update discriminative model for frame T+1

21 Our approach: Implementation details 21 Baseline tracking-by-detection framework HOG + Linear SVM [Supancic and Ramanan, 2013] Multi-scale detector with 2 pixels scanning step Small patch (e.g. 18x18) handling Additional correlation checking Occlusion handling Selective model updating J.S. Supancic and D. Ramanan. Self-paced learning for long-term tracking. In CVPR, 2013.

22 Experimental results 22 VOT-TIR2015 Challenge Dataset Selection: detection score only Overlap #Failures

23 Experimental results 23 VOT-TIR2015 Challenge Dataset Selection: detection score only Selection: detection + edgebox score Overlap #Failures Overlap #Failures

24 Experimental results 24 VOT2015 Challenge Dataset Selection: detection score only Overlap #Failures

25 Experimental results 25 VOT2015 Challenge Dataset Selection: detection score only Selection: detection + edgebox score Overlap #Failures Overlap #Failures

26 Summary 26 Extending tracking-by-detection as a general proposal and selection scheme New geometry proposals A novel selection scheme based on multiple cues Achieving good performance on VOTTIR-2015 and VOT2015 challenge datasets Source code is released at project page

27 Advertisement 27 Proposal Selection Tracker will be presented at Poster Session 3B (Tuesday, 15 Dec. 2015)

28 Advertisement 28

29 Appendix: List 29 spst: overall framework Dense HOG Occlusion handling Detailed experimental results

30 Appendix: Overall framework 30 spst: overall framework Train initial discriminative model on frame 1 T = 2 N Small patch handling Propose non-axis aligned detection candidates on frame T Propose regular detection candidates on frame T Select tracking result with two cues Occlusion handling Update discriminative model for frame T+1

31 Appendix: Dense HOG 31 Dense HOG

32 Appendix: Dense HOG 32 Dense HOG

33 Appendix: Dense HOG 33 Dense HOG

34 Appendix: Dense HOG 34 Dense HOG

35 Appendix: Occlusion handling 35 Occlusion handling For short-term tracking, we update model with tracking result in every frame Selective model updating: If tracking result in current frame is quite similar with tracking result in the previous frame, probably occlusion happens, we don t update model in current frame Frame 246 Frame 247 Frame 248

36 Experimental results 36 VOT-TIR2015 Challenge Dataset Selection: detection score only Marker Overlap #Failures Baseline

37 Experimental results 37 VOT-TIR2015 Challenge Dataset Selection: detection score only Selection: detection + edgebox score Marker Overlap #Failures Marker Overlap #Failures Baseline

38 Experimental results 38 VOT-TIR2015 Challenge Dataset Selection: detection score only Selection: detection + edgebox score Marker Overlap #Failures Marker Overlap #Failures Baseline Submission

39 Experimental results 39 VOT-TIR2015 Challenge Dataset Selection: detection score only Selection: detection + edgebox score Marker Overlap #Failures Marker Overlap #Failures Baseline Submission

40 Experimental results 40 VOT2015 Challenge Dataset Selection: detection score only Marker Overlap #Failures Baseline

41 Experimental results 41 VOT2015 Challenge Dataset Selection: detection score only Selection: detection + edgebox score Marker Overlap #Failures Marker Overlap #Failures Baseline

42 Experimental results 42 VOT2015 Challenge Dataset Selection: detection score only Selection: detection + edgebox score Marker Overlap #Failures Marker Overlap #Failures Baseline Submission

43 Experimental results 43 VOT2015 Challenge Dataset Selection: detection score only Selection: detection + edgebox score Marker Overlap #Failures Marker Overlap #Failures Baseline Submission

Towards Robust Visual Object Tracking: Proposal Selection & Occlusion Reasoning Yang HUA

Towards Robust Visual Object Tracking: Proposal Selection & Occlusion Reasoning Yang HUA PhD Defense, June 10 2016 Rapporteur Rapporteur Examinateur Examinateur Directeur de these Co-encadrant de these