Computer Vision: Making machines see

Size: px

Start display at page:

Download "Computer Vision: Making machines see"

Edith Atkinson
5 years ago
Views:

1 Computer Vision: Making machines see Roberto Cipolla Department of Engineering Laboratory/

2 Vision: what is where by looking Cognitive Systems Engineering

3 Computer Vision What?

4 Computer Vision What?

5 Real-time application

6 Overview 1. Background: why and how? 2. 3R s of Computer Vision: - Reconstruction - Registration - Recognition

7 1. How to make machines that see?

8 How? Introduction

9 1 Geometry - Perspective

10 2 Probabilistic framework Perception is our best guess as to what is in the world, given our current sensory input and our prior experience. Helmholtz (1988) 1. Deal with the ambiguity of the visual world 2. Are able to fuse information 3. Have the ability to learn

11 3 Machine Learning

12 2 Computer Vision at Cambridge

13 Computer Vision: 3R s Reconstruction Recognition Registration Reconstruction: Recover 3D shape Recognition: Identify objects (example) Registration: Compute their position and pose

14 Computer Vision: 3R s Reconstruction Recognition Registration Reconstruction: Recover 3D shape Recognition: Identify objects (example) Registration: Compute their position and pose

15 Reconstruction? Recovery of 3D shape from images

16 Reconstruction

17 Ambiquity in a single view O

18 Stereo vision O e e' O'

19 Stereo vision 3D point

20 Multi-view stereo p 1 p 4 p 3 p 2 minimize f (R,T,P) p 5 p 6 p 7 Camera 1 Camera 3 R 1,t 1 Camera 2 R 3,t 3 R 2,t 2

21 Structure from motion Input sequence 2D features 2D track 3D points

22 Structure from motion Input sequence 2D features 2D track 3D points

23 Structure from motion Input sequence 2D features 2D track 3D points

24 Structure from motion Input sequence 2D features 2D track 3D points

25 3D MRF for 3D modelling

26 3D Models

27 Large Scale Reconstruction

28 Deformable objects: Real-time photometric stereo using colour lighting

29 Textureless deforming objects a method for reconstructing a textureless deforming object in 2.5d

30 Colour Photometric Stereo

31 Real-time deformable surfaces

32 Sample Reconstructions

33 Registration? Target detection and pose estimation

34 Registration: Expressive Visual Text-to- Speech

35 Registration alignment of training data

36 What is an expressive talking head? > User inputs a sentence which they wish to be uttered > User specifies an emotion Video output is generated

37 Our current talking head

38 Expressive Visual Text to Speech

39 Demo XpressiveTalk

40 3D Registration - Magic Mirrors

41 Registration Body shape

42 Single-shot Body Shape

43 Single-shot Body Shape

44 Single-shot Body Shape

45 Recognition?

46 road Recognition image classification categorical object detection horses airplanes background semantic segmentation tree bicycle building grass dog car sky building road

Deep Learning - Class Recognition with CNN 3072

of 3x3 size 1024 feat. of 3x3 size 1024 feat.

classification Layer Convolutional Layer Max pool

47 Deep Learning - Class Recognition with CNN feat. of 11x11 size, 2x2 pool size 256 feat. of 5x5 size, 2x2 pool size 512 feat. of 3x3 size 1024 feat. of 3x3 size 1024 feat. of 3x3 size, 2x2 pool size Soft-max classification Layer Convolutional Layer Max pool + Max pool + Max pool + Cat Dog Horse Bird Convolution with features Rectification (non-linearity) Local Pooling & Subsampling Max pool + 2 fully connected layers W W represents the trainable parameters (features) in a layer

SegNet Architecture Highlights: Learns to extract features using an encoder network (e.g. VGG16) and maps features to pixel wise labels using a decoder network.

48 SegNet Architecture Highlights: Learns to extract features using an encoder network (e.g. VGG16) and maps features to pixel wise labels using a decoder network. Decoders uses the stored pooling indices in the encoding layer to enable upsampling its input to double the resolution. Non-linear upsampling using pooling indices maintains shape of categories, and Reduces the number of parameters in the decoder network by a large margin as compared to other recent architectures.

49 SegNet training from labelled data

50 SegNet predictions on unseen test images - DEMO

51 SegNet Real-time DEMO

52 Why? Applications

53 Summary Computer Vision 1. Background: why and how? 2. 3R s of Computer Vision: - Registration - Reconstruction - Recognition

54 More information Publications: Research demos and code: Research Videos:

Making Machines See. Roberto Cipolla Department of Engineering. Research team

Making Machines See. Roberto Cipolla Department of Engineering. Research team Making Machines See Roberto Cipolla Department of Engineering Research team http://www.eng.cam.ac.uk/~cipolla/people.html Cognitive Systems Engineering Cognitive Systems Engineering Introduction Making