UnFlow: Unsupervised Learning of Optical Flow with a Bidirectional Census Loss

Similar documents
arxiv: v1 [cs.cv] 7 May 2018

Supplementary Material for Zoom and Learn: Generalizing Deep Stereo Matching to Novel Domains

Models Matter, So Does Training: An Empirical Study of CNNs for Optical Flow Estimation

Semi-Supervised Learning for Optical Flow with Generative Adversarial Networks

Supplementary Material for "FlowNet 2.0: Evolution of Optical Flow Estimation with Deep Networks"

arxiv: v2 [cs.cv] 4 Apr 2018

MOTION ESTIMATION USING CONVOLUTIONAL NEURAL NETWORKS. Mustafa Ozan Tezcan

Optical flow. Cordelia Schmid

Optical flow. Cordelia Schmid

arxiv: v1 [cs.cv] 25 Feb 2019

arxiv: v1 [cs.cv] 22 Jan 2016

CS6670: Computer Vision

Joint Unsupervised Learning of Optical Flow and Depth by Watching Stereo Videos

EE795: Computer Vision and Intelligent Systems

Depth from Stereo. Dominic Cheng February 7, 2018

Supplemental Material for End-to-End Learning of Video Super-Resolution with Motion Compensation

arxiv: v1 [cs.cv] 9 May 2018

EE795: Computer Vision and Intelligent Systems

arxiv: v1 [cs.cv] 6 Dec 2016

PWC-Net: CNNs for Optical Flow Using Pyramid, Warping, and Cost Volume

Occlusions, Motion and Depth Boundaries with a Generic Network for Disparity, Optical Flow or Scene Flow Estimation

Deep Learning For Image Registration

Mask R-CNN. presented by Jiageng Zhang, Jingyao Zhan, Yunhan Ma

Evaluating optical flow vectors through collision points of object trajectories in varying computergenerated snow intensities for autonomous vehicles

Ninio, J. and Stevens, K. A. (2000) Variations on the Hermann grid: an extinction illusion. Perception, 29,

Rich feature hierarchies for accurate object detection and semantic segmentation

ActiveStereoNet: End-to-End Self-Supervised Learning for Active Stereo Systems (Supplementary Materials)

Peripheral drift illusion

arxiv: v1 [cs.cv] 21 Sep 2018

S7348: Deep Learning in Ford's Autonomous Vehicles. Bryan Goodman Argo AI 9 May 2017

Deep Tracking: Biologically Inspired Tracking with Deep Convolutional Networks

Flow-Based Video Recognition

Dense Image-based Motion Estimation Algorithms & Optical Flow

FlowNet 2.0: Evolution of Optical Flow Estimation with Deep Networks

Object detection using Region Proposals (RCNN) Ernest Cheung COMP Presentation

Fully Convolutional Networks for Semantic Segmentation

Camera-based Vehicle Velocity Estimation using Spatiotemporal Depth and Motion Features

arxiv: v2 [cs.cv] 7 Oct 2018

Visual Object Tracking. Jianan Wu Megvii (Face++) Researcher Dec 2017

Training models for road scene understanding with automated ground truth Dan Levi

Introduction to Deep Learning

Disguised Face Identification (DFI) with Facial KeyPoints using Spatial Fusion Convolutional Network. Nathan Sun CIS601

Structured Light II. Thanks to Ronen Gvili, Szymon Rusinkiewicz and Maks Ovsjanikov

A Deep Learning Framework for Authorship Classification of Paintings

Two-Stream Convolutional Networks for Action Recognition in Videos

EVALUATION OF DEEP LEARNING BASED STEREO MATCHING METHODS: FROM GROUND TO AERIAL IMAGES

Learning visual odometry with a convolutional network

Photo-realistic Renderings for Machines Seong-heum Kim

Motion and Optical Flow. Slides from Ce Liu, Steve Seitz, Larry Zitnick, Ali Farhadi

Lecture 7: Semantic Segmentation

3D Computer Vision. Structured Light II. Prof. Didier Stricker. Kaiserlautern University.

Filter Flow: Supplemental Material

Octree Generating Networks: Efficient Convolutional Architectures for High-resolution 3D Outputs Supplementary Material

Scene Text Recognition for Augmented Reality. Sagar G V Adviser: Prof. Bharadwaj Amrutur Indian Institute Of Science

Comparison of stereo inspired optical flow estimation techniques

CAP 6412 Advanced Computer Vision

Deep learning for dense per-pixel prediction. Chunhua Shen The University of Adelaide, Australia

arxiv: v1 [cs.cv] 30 Nov 2017

Conditional Prior Networks for Optical Flow

Learning to generate 3D shapes

Deep Optical Flow Estimation Via Multi-Scale Correspondence Structure Learning

Unsupervised Learning of Stereo Matching

Spatio-Temporal Transformer Network for Video Restoration

Unsupervised Learning of Spatiotemporally Coherent Metrics

MCMOT: Multi-Class Multi-Object Tracking using Changing Point Detection

Real-Time Simultaneous 3D Reconstruction and Optical Flow Estimation

Self-supervised Multi-level Face Model Learning for Monocular Reconstruction at over 250 Hz Supplemental Material

arxiv: v1 [cs.cv] 14 Dec 2016

CS231N Section. Video Understanding 6/1/2018

CRF Based Point Cloud Segmentation Jonathan Nation

Motion Estimation using Block Overlap Minimization

Lecture 20: Tracking. Tuesday, Nov 27

CS4495/6495 Introduction to Computer Vision. 3B-L3 Stereo correspondence

Spatial Localization and Detection. Lecture 8-1

LEARNING RIGIDITY IN DYNAMIC SCENES FOR SCENE FLOW ESTIMATION

A Fusion Approach for Multi-Frame Optical Flow Estimation

Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks

SEMANTIC SEGMENTATION AVIRAM BAR HAIM & IRIS TAL

Segmentation Based Stereo. Michael Bleyer LVA Stereo Vision

DeepIM: Deep Iterative Matching for 6D Pose Estimation - Supplementary Material

ACCURATE estimation of optical flow is a useful but challenging. Pyramidal Gradient Matching for Optical Flow Estimation

A Deep Learning Approach to Vehicle Speed Estimation

AdaDepth: Unsupervised Content Congruent Adaptation for Depth Estimation

Robust Optical Flow Computation Under Varying Illumination Using Rank Transform

Quo Vadis, Action Recognition? A New Model and the Kinetics Dataset. By Joa õ Carreira and Andrew Zisserman Presenter: Zhisheng Huang 03/02/2018

Multi-stable Perception. Necker Cube

Optical Flow CS 637. Fuxin Li. With materials from Kristen Grauman, Richard Szeliski, S. Narasimhan, Deqing Sun

Joint Inference in Image Databases via Dense Correspondence. Michael Rubinstein MIT CSAIL (while interning at Microsoft Research)

Cascade Residual Learning: A Two-stage Convolutional Neural Network for Stereo Matching

What have we leaned so far?

Human Pose Estimation with Deep Learning. Wei Yang

Regionlet Object Detector with Hand-crafted and CNN Feature

arxiv: v3 [cs.cv] 22 Mar 2018

Convolutional Neural Network Implementation of Superresolution Video

arxiv: v1 [cs.cv] 20 Sep 2017

CUDA GPGPU. Ivo Ihrke Tobias Ritschel Mario Fritz

arxiv: v1 [cs.cv] 13 Dec 2018

Lucas-Kanade Motion Estimation. Thanks to Steve Seitz, Simon Baker, Takeo Kanade, and anyone else who helped develop these slides.

Part II: Modeling Aspects

Structured Light II. Thanks to Ronen Gvili, Szymon Rusinkiewicz and Maks Ovsjanikov

Transcription:

UnFlow: Unsupervised Learning of Optical Flow with a Bidirectional Census Loss AAAI 2018, New Orleans, USA Simon Meister, Junhwa Hur, and Stefan Roth Department of Computer Science, TU Darmstadt

2 Deep Networks for Optical Flow Deep CNNs for flow e.g. FlowNet: Encoder-Decoder network Given two images, outputs dense flow Real-time inference with high accuracy Supervision from synthetic datasets Fischer et al. (2015), "FlowNet: Learning Optical Flow with Convolutional Networks"

3 Domain Mismatch Training domains Domains of interest

4 Training with Realistic Data Unsupervised deep learning for optical flow Train on the target domain No ground truth flow Unlabeled frame pairs (e.g. from video) Design proxy loss

5 Unsupervised Loss (Baseline) Use classical optical flow constraints [Yu et al.] Backward-warp II 2 (using ww ff ) with Bilinear sampling [Jaderberg et al.] of II 2 at ww ff (xx) Data loss EE DD : brightness difference of II 1 and backward-warped II 2 First order smoothness loss EE SS : difference of neighboring flows II 1 Backward warp ww ff II 2 (xx + ww ff ) Smoothness loss EE SS Data loss EE DD II 2 Brightness constancy: II 1 (xx) II 2 xx + ww ff (xx) Jaderberg et al. (2015), "Spatial Transformer Networks" Yu et al. (2016), "Unsupervised learning of optical flow via brightness constancy and motion smoothness"

6 Unsupervised Loss (Baseline) Issues with this most basic loss Lighting changes brightness constancy violated Occlusions can t compare II 1 (xx) and II 2 xx + ww ff First-order smoothness may be limiting II 1 Backward warp ww ff II 2 (xx + ww ff ) Smoothness loss EE SS Data loss EE DD II 2 Brightness constancy: II 1 (xx) II 2 xx + ww ff (xx)

7 UnFlow Apply advanced ideas from classical optical flow to deep learning Robustness to lighting changes (census transform) Occlusion handling (bidirectional flow) Second-order smoothness II 1 Backward warp II 2 (xx + ww ff ) ww ff oo ff Forward-backward consistency ww bb oo bb II 2 Smoothness loss EE SS Consistency loss EE CC Data loss EE DD II 1 (xx + ww bb )

8 UnFlow Compute bidirectional flow (ww ff, ww bb ) with CNN FlowNetC (or any other optical flow network) First image II 1 Forward flow ww ff (II 1, II 2 ) Second image II 2 Shared weights Backward flow ww bb (II 2, II 1 )

9 UnFlow Given bidirectional flow Forward-backward check [Sundaram et al.]: compare ww ff (xx) and ww bb Should be inverse to each other for non-occluded xx Threshold Occlusion flag oo ff (swap f/b for oo bb ) Below? Consistency loss EE CC for difference xx + ww ff xx II 1 ww ff oo ff Forward-backward consistency ww bb oo bb II 2 Consistency loss EE CC Sundaram et al. (2010), Dense point trajectories by GPU-accelerated large displacement optical flow"

10 UnFlow Bidirectional image-based loss Compare II 1 and backward-warped II 2 (using ww ff ) Bilinear sampling [Jaderberg et al.] at ww ff (xx) Compare II 2 and backward-warped II 1 (using ww bb ) II 1 Backward warp ww ff II 2 (xx + ww ff ) II 2 ww bb II 1 (xx + ww bb ) Jaderberg et al. (2015), "Spatial Transformer Networks"

11 UnFlow Data loss EE DD Census transform [Stein] of II 1 and II 2 xx + ww ff Invariant to many changes due to lighting Only at non-occluded pixels (oo ff = 0) Same for II 2 and II 1 (xx + ww bb ) 64 12 10 1-1 -1 34 33 33 1 x 0 22 45 51-1 1 1 1-1 -1 1 0-1 1 1 II 1 Backward warp II 2 (xx + ww ff ) ww ff oo ff Forward-backward consistency ww bb oo bb II 2 Consistency loss EE CC Data loss EE DD II 1 (xx + ww bb ) Stein (2004), "Efficient computation of optical flow using the census transform"

12 UnFlow Smoothness loss EE SS Second-order regularizer [Trobin et al.] Penalizes large second derivatives of the flow ww ff (or ww bb ) Encourages collinear neighbors The only loss at occluded pixels (oo ff = 1) II 1 Backward warp II 2 (xx + ww ff ) ww ff oo ff Forward-backward consistency ww bb oo bb II 2 Smoothness loss EE SS Consistency loss EE CC Data loss EE DD II 1 (xx + ww bb ) Trobin et al. (2008), An unbiased second-order prior for high-accuracy motion estimation"

13 Iterative refinement Network stacking [Ilg et al.] FlowNetC FlowNetS Input first flow estimates and warped images First network Second network (not shared) ww ff II 2 (xx + ww ff ) ww ff,2 II 1 II 2 ww bb II 1 (xx + ww bb ) UnFlow-C UnFlow-CS ww bb,2 Ilg et al. (2017), "FlowNet 2.0: Evolution of Optical Flow Estimation with Deep Networks"

14 Training Schedule Curriculum (100% unsupervised) 1. SYNTHIA pre-training Large synthetic dataset with simple lighting 2. KITTI raw Large real-world driving dataset Excluding small number of frames with ground truth flow No need for specifically generated synthetic optical flow datasets FlyingChairs, FlyingThings3D,

Results 15

16 Metrics Average Endpoint Error (AEE) Average euclidean distance of prediction to ground truth flow vectors KITTI Outliers Ratio of pixels where flow estimate is wrong by both 3 pixels and 5% (at least)

17 Loss Ablation KITTI 2012 Comparing Baseline [Yu et al.] vs. UnFlow-C Brightness constancy census loss Reduces AEE by 35% Data loss Smoothness Occlusion AEE (All) Outliers (All) Brightness 1st-order 7.20 31.93% Census 1st-order 4.66 20.85% Yu et al. (2016), "Unsupervised learning of optical flow via brightness constancy and motion smoothness"

18 Loss Ablation KITTI 2012 Comparing Baseline [Yu et al.] vs. UnFlow-C 1st 2nd order smoothness Reduces AEE by 5% and outliers by 17% Data loss Smoothness Occlusion AEE (All) Outliers (All) Brightness 1st-order 7.20 31.93% Census 1st-order 4.66 20.85% Census 2nd-order 4.40 17.22% Yu et al. (2016), "Unsupervised learning of optical flow via brightness constancy and motion smoothness"

19 Loss Ablation KITTI 2012 Comparing Baseline [Yu et al.] vs. UnFlow-C Forward-backward mechanisms (occlusion masking & consistency) Reduces AEE by 14% Data loss Smoothness Occlusion AEE (All) Outliers (All) Brightness 1st-order 7.20 31.93% Census 1st-order 4.66 20.85% Census 2nd-order 4.40 17.22% Census 2nd-order Forward-backward check 3.78 16.44% Yu et al. (2016), "Unsupervised learning of optical flow via brightness constancy and motion smoothness"

20 Loss Ablation KITTI 2012 Comparing Baseline [Yu et al.] vs. UnFlow-C UnFlow reduces AEE and outliers by 48% Similar observations on KITTI 2015 Data loss Smoothness Occlusion AEE (All) Outliers (All) Brightness 1st-order 7.20 31.93% Census 1st-order 4.66 20.85% Census 2nd-order 4.40 17.22% Census 2nd-order Forward-backward check 3.78 16.44% Yu et al. (2016), "Unsupervised learning of optical flow via brightness constancy and motion smoothness"

21 Baseline vs. UnFlow (KITTI 2015) Input images Ground truth flow Baseline Predicted flow UnFlow Predicted flow Flow error (red = high, blue = low) Flow error (red = high, blue = low)

22 Benchmarks (KITTI) non-finetuned Comparing supervised networks vs. UnFlow Similar networks trained on synthetic domains UnFlow reduces AEE by up to 49% (FlowNetS, 2012) Method AEE (All) 2012 train FlowNetS+ft [Dosovitskiy et al.] 7.5 AEE (All) 2015 train FlowNet2-C [Ilg et al.] 11.36 UnFlow-C [ours] 3.78 8.80 Dosovitsky et al. (2015), "FlowNet: Learning optical flow with convolutional networks" Ilg et al. (2017), "FlowNet 2.0: Evolution of optical flow estimation with deep networks"

23 Benchmarks (KITTI) non-finetuned Comparing supervised networks vs. UnFlow UnFlow even performs slightly better on off-domain data Method AEE (All) 2012 train AEE (All) 2015 train AEE (All) Middlebury FlowNetS+ft [Dosovitskiy et al.] 7.5 0.98 FlowNet2-C [Ilg et al.] 11.36 UnFlow-C [ours] 3.78 8.80 0.88 Dosovitsky et al. (2015), "FlowNet: Learning optical flow with convolutional networks"

24 Conclusion UnFlow Comprehensive unsupervised proxy loss 48% improvement over brightness constancy baseline Outperforms synthetic off-domain supervision Code open-sourced at https://github.com/simonmeister/unflow

Supplementary slides 25

26 Supervised Fine-tuning Input images 1. Unsupervised training 2. (optional) Supervised fine-tuning KITTI 2012 & 2015 train Method AEE (All) 2012 test Outliers 2015 test FlowNet2-ft-kitti [Ilg et al.] 1.8 10.41% Ground truth flow UnFlow-CSS-ft 1.7 11.11% Competetive fine-tuning performance without pre-training with special synthetic datasets Ilg et al. (2017), "FlowNet 2.0: Evolution of optical flow estimation with deep networks"

27 Benchmarks (KITTI) non-finetuned Comparing previous unsupervised networks vs. UnFlow Similar networks & training schedules UnFlow reduces AEE by up to 66% Method AEE (All) 2012 train UnsupFlownet [Yu et al.] 11.3 DSTFlow [Ren et al.] 10.43 UnFlow-C [ours] 3.78 Yu et al. (2016), "Unsupervised learning of optical flow via brightness constancy and motion smoothness" Ren et al. (2017), "Unsupervised deep learning for optical flow estimation"

28 Loss Ablation KITTI 2012 Comparing Baseline [Yu et al.] vs. UnFlow Training on SYNTHIA instead of FlyingChairs slightly improves AEE Our baseline re-implementation is more accurate than the results reported by Yu et al. (AEE of 11.3 vs. our 8.26) Data loss Smoothness Occlusion AEE (All) Outliers (All) Brightness 1st-order 8.26 Brightness 1st-order 7.20 31.93% Census 1st-order 4.66 20.85% Census 2nd-order 4.40 17.22% Census 2nd-order Forward-backward check 3.78 16.44% Yu et al. (2016), "Unsupervised learning of optical flow via brightness constancy and motion smoothness"

29 FlowNetS / UnsupFlownet / UnFlow-CSS FlowNetS - KITTI 2012 flow error (white = high,black = low) Occluded sections are red FlowNetS - predicted flow UnsupFlownet - KITTI 2012 flow error UnsupFlowNet - predicted flow UnFlow-CSS - KITTI 2012 flow error UnFlow-CSS - predicted flow

30 Baseline vs. UnFlow (KITTI 2015) Input images Ground truth flow Baseline Predicted flow UnFlow Predicted flow Flow error (red = high, blue = low) Flow error (red = high, blue = low)

UnFlow-CSS-ft (KITTI 2015 test) 31

UnFlow-CSS-ft (KITTI 2015 test) 32

33 Baseline vs. UnFlow (KITTI 2015) Flow error (red = high, blue = low) Baseline Flow error (red = high, blue = low) UnFlow

34 Baseline vs. UnFlow (KITTI 2015) Flow error (red = high, blue = low) Baseline Flow error (red = high, blue = low) UnFlow