Computer Vision: Making machines see

Similar documents
Making Machines See. Roberto Cipolla Department of Engineering. Research team

Computer Vision at Cambridge: Reconstruction,Registration and Recognition

Why study Computer Vision?

Analysis: TextonBoost and Semantic Texton Forests. Daniel Munoz Februrary 9, 2009

Places Challenge 2017

Object Recognition II

Arbitrary Style Transfer in Real-Time with Adaptive Instance Normalization. Presented by: Karen Lucknavalai and Alexandr Kuznetsov

Contents I IMAGE FORMATION 1

6. Convolutional Neural Networks

Semantic Segmentation

Deep learning for dense per-pixel prediction. Chunhua Shen The University of Adelaide, Australia

What is Computer Vision? Introduction. We all make mistakes. Why is this hard? What was happening. What do you see? Intro Computer Vision

Why study Computer Vision?

Deep condolence to Professor Mark Everingham

Deep Models for 3D Reconstruction

Learning to generate 3D shapes

Computer Vision Lecture 17

Yiqi Yan. May 10, 2017

Computer Vision Lecture 17

Presented at the FIG Congress 2018, May 6-11, 2018 in Istanbul, Turkey

Deep Learning for Computer Vision with MATLAB By Jon Cherrie

Object Detection on Self-Driving Cars in China. Lingyun Li

Creating Affordable and Reliable Autonomous Vehicle Systems

Predicting Depth, Surface Normals and Semantic Labels with a Common Multi-Scale Convolutional Architecture David Eigen, Rob Fergus

Internet of things that video

Call for Proposals Media Technology HIRP OPEN 2017

CS231N Section. Video Understanding 6/1/2018

Video Object Segmentation using Deep Learning

Computer Vision: Summary and Discussion

Deep Learning. Deep Learning. Practical Application Automatically Adding Sounds To Silent Movies

Deep Incremental Scene Understanding. Federico Tombari & Christian Rupprecht Technical University of Munich, Germany

List of Accepted Papers for ICVGIP 2018

Lecture 7: Semantic Segmentation

Semantic Video Indexing

Object Detection Lecture Introduction to deep learning (CNN) Idar Dyrdal

Computer Vision. From traditional approaches to deep neural networks. Stanislav Frolov München,

Lecture 5: Object Detection

Attributes. Computer Vision. James Hays. Many slides from Derek Hoiem

2 OVERVIEW OF RELATED WORK

Encoder-Decoder Networks for Semantic Segmentation. Sachin Mehta

Does the Brain do Inverse Graphics?

RGBd Image Semantic Labelling for Urban Driving Scenes via a DCNN

24 hours of Photo Sharing. installation by Erik Kessels

Does the Brain do Inverse Graphics?

Object Detection. Part1. Presenter: Dae-Yong

VISION FOR AUTOMOTIVE DRIVING

Learning Deep Structured Models for Semantic Segmentation. Guosheng Lin

Deformable Part Models

Structure from Motion. Introduction to Computer Vision CSE 152 Lecture 10

Nonlinear State Estimation for Robotics and Computer Vision Applications: An Overview

CS 4495 Computer Vision A. Bobick. Motion and Optic Flow. Stereo Matching

Deep Learning for Virtual Shopping. Dr. Jürgen Sturm Group Leader RGB-D

Structured Models in. Dan Huttenlocher. June 2010

Learning Semantic Environment Perception for Cognitive Robots

Topics to be Covered in the Rest of the Semester. CSci 4968 and 6270 Computational Vision Lecture 15 Overview of Remainder of the Semester

Multi-View 3D Object Detection Network for Autonomous Driving

INTRODUCTION TO DEEP LEARNING

AUTOMATIC VIDEO INDEXING

Joint Inference in Image Databases via Dense Correspondence. Michael Rubinstein MIT CSAIL (while interning at Microsoft Research)

Unified, real-time object detection

Why equivariance is better than premature invariance

Detection III: Analyzing and Debugging Detection Methods

Towards Large-Scale Semantic Representations for Actionable Exploitation. Prof. Trevor Darrell UC Berkeley

Machine Learning 13. week

Semantic RGB-D Perception for Cognitive Robots

Associating video frames with text

Object Recognition. Lecture 11, April 21 st, Lexing Xie. EE4830 Digital Image Processing

Content-Based Image Recovery

3D Scanning. Qixing Huang Feb. 9 th Slide Credit: Yasutaka Furukawa

Multi-Class Segmentation with Relative Location Prior

CMU Lecture 18: Deep learning and Vision: Convolutional neural networks. Teacher: Gianni A. Di Caro

Computer Vision with MATLAB MATLAB Expo 2012 Steve Kuznicki

Depth. Common Classification Tasks. Example: AlexNet. Another Example: Inception. Another Example: Inception. Depth

Su et al. Shape Descriptors - III

Colorado School of Mines. Computer Vision. Professor William Hoff Dept of Electrical Engineering &Computer Science.

Learning and Recognizing Visual Object Categories Without First Detecting Features

CAP 5415 Computer Vision. Fall 2011

Vision based autonomous driving - A survey of recent methods. -Tejus Gupta

Colorado School of Mines. Computer Vision. Professor William Hoff Dept of Electrical Engineering &Computer Science.

Augmenting Reality, Naturally:

Paper Motivation. Fixed geometric structures of CNN models. CNNs are inherently limited to model geometric transformations

Towards Fully-automated Driving. tue-mps.org. Challenges and Potential Solutions. Dr. Gijs Dubbelman Mobile Perception Systems EE-SPS/VCA

Computer Vision I - Introduction

Turning an Automated System into an Autonomous system using Model-Based Design Autonomous Tech Conference 2018

SEMANTIC segmentation has a wide array of applications

Stereo. 11/02/2012 CS129, Brown James Hays. Slides by Kristen Grauman

Vision is inferential. (

The Hilbert Problems of Computer Vision. Jitendra Malik UC Berkeley & Google, Inc.

CS 4495 Computer Vision A. Bobick. Motion and Optic Flow. Stereo Matching

Multiple Model Estimation : The EM Algorithm & Applications

Stereo. Outline. Multiple views 3/29/2017. Thurs Mar 30 Kristen Grauman UT Austin. Multi-view geometry, matching, invariant features, stereo vision

Solution: filter the image, then subsample F 1 F 2. subsample blur subsample. blur

視覚情報処理論. (Visual Information Processing ) 開講所属 : 学際情報学府水 (Wed)5 [16:50-18:35]

An Evaluation of Volumetric Interest Points

Markov Networks in Computer Vision

Recognizing Deformable Shapes. Salvador Ruiz Correa (CSE/EE576 Computer Vision I)

A Deep Learning Approach to Vehicle Speed Estimation

Object Detection Based on Deep Learning

Dynamic Routing Between Capsules. Yiting Ethan Li, Haakon Hukkelaas, and Kaushik Ram Ramasamy

Automatic Thoracic CT Image Segmentation using Deep Convolutional Neural Networks. Xiao Han, Ph.D.

Transcription:

Computer Vision: Making machines see Roberto Cipolla Department of Engineering http://www.eng.cam.ac.uk/~cipolla/people.html http://www.toshiba.eu/eu/cambridge-research- Laboratory/

Vision: what is where by looking Cognitive Systems Engineering

Computer Vision What?

Computer Vision What?

Real-time application

Overview 1. Background: why and how? 2. 3R s of Computer Vision: - Reconstruction - Registration - Recognition

1. How to make machines that see?

How? Introduction

1 Geometry - Perspective

2 Probabilistic framework Perception is our best guess as to what is in the world, given our current sensory input and our prior experience. Helmholtz (1988) 1. Deal with the ambiguity of the visual world 2. Are able to fuse information 3. Have the ability to learn

3 Machine Learning

2 Computer Vision at Cambridge

Computer Vision: 3R s Reconstruction Recognition Registration Reconstruction: Recover 3D shape Recognition: Identify objects (example) Registration: Compute their position and pose

Computer Vision: 3R s Reconstruction Recognition Registration Reconstruction: Recover 3D shape Recognition: Identify objects (example) Registration: Compute their position and pose

Reconstruction? Recovery of 3D shape from images

Reconstruction

Ambiquity in a single view O

Stereo vision O e e' O'

Stereo vision 3D point

Multi-view stereo p 1 p 4 p 3 p 2 minimize f (R,T,P) p 5 p 6 p 7 Camera 1 Camera 3 R 1,t 1 Camera 2 R 3,t 3 R 2,t 2

Structure from motion Input sequence 2D features 2D track 3D points

Structure from motion Input sequence 2D features 2D track 3D points

Structure from motion Input sequence 2D features 2D track 3D points

Structure from motion Input sequence 2D features 2D track 3D points

3D MRF for 3D modelling

3D Models

Large Scale Reconstruction

Deformable objects: Real-time photometric stereo using colour lighting

Textureless deforming objects a method for reconstructing a textureless deforming object in 2.5d

Colour Photometric Stereo

Real-time deformable surfaces

Sample Reconstructions

Registration? Target detection and pose estimation

Registration: Expressive Visual Text-to- Speech

Registration alignment of training data

What is an expressive talking head? > User inputs a sentence which they wish to be uttered > User specifies an emotion Video output is generated

Our current talking head

Expressive Visual Text to Speech

Demo XpressiveTalk

3D Registration - Magic Mirrors

Registration Body shape

Single-shot Body Shape

Single-shot Body Shape

Single-shot Body Shape

Recognition?

road Recognition image classification categorical object detection horses airplanes background semantic segmentation tree bicycle building grass dog car sky building road

Deep Learning - Class Recognition with CNN 3072 4096 96 feat. of 11x11 size, 2x2 pool size 256 feat. of 5x5 size, 2x2 pool size 512 feat. of 3x3 size 1024 feat. of 3x3 size 1024 feat. of 3x3 size, 2x2 pool size Soft-max classification Layer Convolutional Layer Max pool + Max pool + Max pool + Cat Dog Horse Bird Convolution with features Rectification (non-linearity) Local Pooling & Subsampling Max pool + 2 fully connected layers W W represents the trainable parameters (features) in a layer

SegNet Architecture Highlights: Learns to extract features using an encoder network (e.g. VGG16) and maps features to pixel wise labels using a decoder network. Decoders uses the stored pooling indices in the encoding layer to enable upsampling its input to double the resolution. Non-linear upsampling using pooling indices maintains shape of categories, and Reduces the number of parameters in the decoder network by a large margin as compared to other recent architectures.

SegNet training from labelled data

SegNet predictions on unseen test images - DEMO

SegNet Real-time DEMO

Why? Applications

Summary Computer Vision 1. Background: why and how? 2. 3R s of Computer Vision: - Registration - Reconstruction - Recognition

More information Publications: http://mi.eng.cam.ac.uk/~cipolla/publications.htm Research demos and code: http://mi.eng.cam.ac.uk/projects/segnet/ http://mi.eng.cam.ac.uk/projects/relocalisation/ Research Videos: https://www.youtube.com/user/computervisionvideos