Deep learning for dense per-pixel prediction. Chunhua Shen The University of Adelaide, Australia

Similar documents
Learning Deep Structured Models for Semantic Segmentation. Guosheng Lin

Fuzzy Set Theory in Computer Vision: Example 3

Depth from Stereo. Dominic Cheng February 7, 2018

Computer Vision Lecture 16

Computer Vision Lecture 16

SSD: Single Shot MultiBox Detector. Author: Wei Liu et al. Presenter: Siyu Jiang

Deep Incremental Scene Understanding. Federico Tombari & Christian Rupprecht Technical University of Munich, Germany

Depth Learning: When Depth Estimation Meets Deep Learning

Know your data - many types of networks

Semantic Segmentation

DeepLab: Semantic Image Segmentation with Deep Convolutional Nets, Atrous Convolution and Fully Connected CRFs

Structured Prediction using Convolutional Neural Networks

Predicting Depth, Surface Normals and Semantic Labels with a Common Multi-Scale Convolutional Architecture David Eigen, Rob Fergus

arxiv: v1 [cs.cv] 31 Mar 2016

Inception Network Overview. David White CS793

Encoder-Decoder Networks for Semantic Segmentation. Sachin Mehta

Depth Estimation from Single Image Using CNN-Residual Network

Computer Vision Lecture 16

Spatial Localization and Detection. Lecture 8-1

Deep Learning in Visual Recognition. Thanks Da Zhang for the slides

RefineNet: Multi-Path Refinement Networks for High-Resolution Semantic Segmentation

Yiqi Yan. May 10, 2017

RGBd Image Semantic Labelling for Urban Driving Scenes via a DCNN

arxiv: v1 [cs.cv] 29 Sep 2016

MCMOT: Multi-Class Multi-Object Tracking using Changing Point Detection

CNNS FROM THE BASICS TO RECENT ADVANCES. Dmytro Mishkin Center for Machine Perception Czech Technical University in Prague

ActiveStereoNet: End-to-End Self-Supervised Learning for Active Stereo Systems (Supplementary Materials)

RefineNet: Multi-Path Refinement Networks for High-Resolution Semantic Segmentation

Fully Convolutional Network for Depth Estimation and Semantic Segmentation

YOLO9000: Better, Faster, Stronger

Convolutional Neural Networks. Computer Vision Jia-Bin Huang, Virginia Tech

Proceedings of the International MultiConference of Engineers and Computer Scientists 2018 Vol I IMECS 2018, March 14-16, 2018, Hong Kong

COMP9444 Neural Networks and Deep Learning 7. Image Processing. COMP9444 c Alan Blair, 2017

Flow-Based Video Recognition

Convolutional Neural Networks: Applications and a short timeline. 7th Deep Learning Meetup Kornel Kis Vienna,

Advanced Video Analysis & Imaging

Yelp Restaurant Photo Classification

Lecture 7: Semantic Segmentation

Real Time Monitoring of CCTV Camera Images Using Object Detectors and Scene Classification for Retail and Surveillance Applications

HENet: A Highly Efficient Convolutional Neural. Networks Optimized for Accuracy, Speed and Storage

VISION FOR AUTOMOTIVE DRIVING

CENG 783. Special topics in. Deep Learning. AlchemyAPI. Week 11. Sinan Kalkan

Mask R-CNN. presented by Jiageng Zhang, Jingyao Zhan, Yunhan Ma

Deep Learning. Visualizing and Understanding Convolutional Networks. Christopher Funk. Pennsylvania State University.

Deep learning for object detection. Slides from Svetlana Lazebnik and many others

Convolutional Neural Networks + Neural Style Transfer. Justin Johnson 2/1/2017

Machine Learning. Deep Learning. Eric Xing (and Pengtao Xie) , Fall Lecture 8, October 6, Eric CMU,

Deep Learning for Computer Vision II

CS231N Section. Video Understanding 6/1/2018

Study of Residual Networks for Image Recognition

Fully Convolutional Networks for Semantic Segmentation

3 Object Detection. BVM 2018 Tutorial: Advanced Deep Learning Methods. Paul F. Jaeger, Division of Medical Image Computing

Supplementary Material for Zoom and Learn: Generalizing Deep Stereo Matching to Novel Domains

AdaDepth: Unsupervised Content Congruent Adaptation for Depth Estimation

RDFNet: RGB-D Multi-level Residual Feature Fusion for Indoor Semantic Segmentation

arxiv: v1 [cs.cv] 17 Oct 2016

Channel Locality Block: A Variant of Squeeze-and-Excitation

Deep Tracking: Biologically Inspired Tracking with Deep Convolutional Networks

INTRODUCTION TO DEEP LEARNING

Convolutional Neural Networks

LSTM and its variants for visual recognition. Xiaodan Liang Sun Yat-sen University

Efficient Segmentation-Aided Text Detection For Intelligent Robots

Deep Learning with Tensorflow AlexNet

Exploiting Depth from Single Monocular Images for Object Detection and Semantic Segmentation

Martian lava field, NASA, Wikipedia

Fei-Fei Li & Justin Johnson & Serena Yeung

Quo Vadis, Action Recognition? A New Model and the Kinetics Dataset. By Joa õ Carreira and Andrew Zisserman Presenter: Zhisheng Huang 03/02/2018

Depth Estimation from a Single Image Using a Deep Neural Network Milestone Report

arxiv: v2 [cs.cv] 1 Sep 2016

AUTOMATIC 3D HUMAN ACTION RECOGNITION Ajmal Mian Associate Professor Computer Science & Software Engineering

Deconvolutions in Convolutional Neural Networks

Deeper Depth Prediction with Fully Convolutional Residual Networks

Facial Key Points Detection using Deep Convolutional Neural Network - NaimishNet

Deep Learning Explained Module 4: Convolution Neural Networks (CNN or Conv Nets)

Supplementary material for Analyzing Filters Toward Efficient ConvNet

A FRAMEWORK OF EXTRACTING MULTI-SCALE FEATURES USING MULTIPLE CONVOLUTIONAL NEURAL NETWORKS. Kuan-Chuan Peng and Tsuhan Chen

Convolutional Neural Networks

Object Detection. CS698N Final Project Presentation AKSHAT AGARWAL SIDDHARTH TANWAR

EVALUATION OF DEEP LEARNING BASED STEREO MATCHING METHODS: FROM GROUND TO AERIAL IMAGES

ImageNet Classification with Deep Convolutional Neural Networks

Introduction to Neural Networks

Direct Methods in Visual Odometry

Mask R-CNN. Kaiming He, Georgia, Gkioxari, Piotr Dollar, Ross Girshick Presenters: Xiaokang Wang, Mengyao Shi Feb. 13, 2018

Classification of objects from Video Data (Group 30)

Seminars in Artifiial Intelligenie and Robotiis

Joint Object Detection and Viewpoint Estimation using CNN features

POINT CLOUD DEEP LEARNING

CNN for Low Level Image Processing. Huanjing Yue

Deep Residual Learning

Computer Vision: Making machines see

Regionlet Object Detector with Hand-crafted and CNN Feature

In-Place Activated BatchNorm for Memory- Optimized Training of DNNs

Inception and Residual Networks. Hantao Zhang. Deep Learning with Python.

Part Localization by Exploiting Deep Convolutional Networks

FishNet: A Versatile Backbone for Image, Region, and Pixel Level Prediction

Intro to Deep Learning. Slides Credit: Andrej Karapathy, Derek Hoiem, Marc Aurelio, Yann LeCunn

Object detection with CNNs

Dense Tracking and Mapping for Autonomous Quadrocopters. Jürgen Sturm

Residual Networks And Attention Models. cs273b Recitation 11/11/2016. Anna Shcherbina

Smart Parking System using Deep Learning. Sheece Gardezi Supervised By: Anoop Cherian Peter Strazdins

Transcription:

Deep learning for dense per-pixel prediction Chunhua Shen The University of Adelaide, Australia

Image understanding

Classification error Convolution Neural Networks 0.3 0.2 0.1 Image Classification [Krizhevsky et al., 2012] 16.4% error (AlexNet) 0 [He et al., 2015] 3.6% error (ResNet) 2010 2011 2012 2013 2014 2015 ILSVRC year [Szegedt et al., 2014] 6.6% error (GoogLeNet) [Simonyan et al., 2014] 7.3% error (VGGNet) [Zeiler et al., 2013] 11.1% error

Google s best reported results 2016 Wider or Deeper: Revisiting the ResNet Model for Visual Recognition, arxiv:1611.10080

Image understanding

Image understanding

Depth Estimation From Single Monocular Images

Useful Depth Estimation From Single Monocular Images Scene understanding 3D modelling Benefit other vision tasks e.g., semantic labellings, pose estimations Challenging No reliable depth cues e.g., stereo correspondence, motion information

Deep$Convolutional$Neural$Fields convolutional neural fields 11/09/2015 19

Deep Prediction*examples:*NYU*v2 convolutional neural fields 38

Deep Prediction*examples:*Make*3D convolutional neural fields

Conclusion Deep convolutional neural fields for monocular image depth estimations Combine deep CNN and continuous CRF General learning framework Learning Depth from Single Monocular Images Using Deep Convolutional Neural Fields Fayao Liu, Chunhua Shen, Guosheng Lin CVPR2015 http://arxiv.org/abs/1502.07411

Monocular Depth Estimation with Augmented Ordinal Depth Relationships Motivation: Limited metric RGB-D data in diversity and quantity. Relative depth has been proven to be an informative cue. Relative depth can be easily acquired from vast stereo videos. Highlights: A new Relative Depth in Stereo (RDIS) dataset is proposed. Densely labelled relative depth using existing stereo matching methods. State-of-the-art results on benchmark Depth Estimation datasets.

Overview 1. Acquire relative depth from stereo videos. 2. Pretrain a deep ResNet with relative depths. 3. Finetune the ResNet with metric depths. 1 2 ResNet Predicted Depth 3

Post Initial Image Relative Depth Generation 1. Use the absolute difference (AD) matching cost and the semi-global matching (SGM) method to generate the initial disparity maps. 2. Post-process the disparity maps: Correct vague or missing boundaries of objects, and smooth disparities within objects and background. This is done by experienced workers from movie production companies.

Results State-of-the-art results on NYUD2 State-of-the-art results on KITTI

Semantic pixel labelling using FCN

RefineNet: Multi-Path Refinement Networks for High-Resolution Semantic Segmentation

Existing approaches 1. Standard multi-layer CNNs, such as ResNet (a): producing low-resolution (down-sampled) feature maps; fine structures/details are lost. 2. Dilated convolutions (b): Resulting high-resolution and high-dimension feature maps; computationally expensive and huge memory consumption if generating large resolution output.

Our approach Exploits various levels of detail at different stages of convolutions and fuses them to obtain a high-resolution prediction without the need to maintain large intermediate feature maps

Highlights 1. Exploits features at multiple levels of abstraction for high-resolution output. RefineNet refines low-resolution (coarse) semantic features with fine-grained low-level features in a recursive manner to generate high-resolution semantic feature maps. Our model is flexible in that it can be cascaded and modified in various ways. 2. Effective gradient propagation with identity mappings through short and long range connections Our cascaded RefineNets can be effectively trained end-to-end, which is crucial for best prediction performance. All components in RefineNet employ residual connections with identity mappings, such that gradients can be directly propagated through short-range and long-range residual connections allowing for both effective and efficient end-to-end training. 3. Chained residual pooling We propose a new network component we call chained residual pooling which is able to capture background context from a large image region. It does so by efficiently pooling features with multiple window sizes and fusing them together with residual connections and learnable weights.

Flexible network architectures

Experiments Our source code is available at: https://github.com/guosheng/refinenet

15 FPS with 720P input on a single GPU

Low-level image processing with very deep FCN

Denoise

Super-resolution

Deblur

Enhancing JPEG images

Inpainting

Inpainting

. Superior results on Denoising, & Super resolution. Many other low-level image processing tasks:. Deblur. Dehaze Image Restoration Using Very Deep Fully Convolutional Encoder-Decoder Networks with Symmetric Skip Connections, X. Mao, C. Shen, Y. Yang, NIPS 2016.

Thanks. Questions?