Computer Vision eine Herausforderung in der Künstlichen Intelligenz

Size: px

Start display at page:

Download "Computer Vision eine Herausforderung in der Künstlichen Intelligenz"

Melvin Fitzgerald
5 years ago
Views:

1 Computer Vision eine Herausforderung in der Künstlichen Intelligenz Prof. Carsten Rother Computer Vision Lab Dresden Institute of Artificial Intelligence 11/12/2013 Computer Vision a hard case for AI

2 Roadmap for this lecture A few more words on the history of AI and subareas of AI An introduction to Computer Vision What is it? Why is it hard? How can we solve it? What can we do with it? Roadmap for the remaining lecture 11/12/2013 Computer Vision a hard case for AI 2

3 Roadmap for this lecture A few more words on the history of AI and subareas of AI An introduction to Computer Vision What is it? Why is it hard? How can we solve it? What can we do with it? Roadmap for the remaining lecture 11/12/2013 Computer Vision a hard case for AI 3

4 From first lecture 11/12/2013 Computer Vision a hard case for AI 4

Going back to 1973 Sir James Lighthill report to the British Parliament The general purpose robot is a mirage Ein Roboter der alles kann ist eine

5 Going back to 1973 Sir James Lighthill report to the British Parliament The general purpose robot is a mirage Ein Roboter der alles kann ist eine Illusion Full report on Youtube: 11/12/2013 Computer Vision a hard case for AI 5

Going back to 1973 Sir James Lighthill report to the British Parliament He specifically mentioned the problem of "combinatorial explosion" or "intractability", which implied that

6 Going back to 1973 Sir James Lighthill report to the British Parliament He specifically mentioned the problem of "combinatorial explosion" or "intractability", which implied that many of AI's most successful algorithms would grind to a halt on real world problems and were only suitable for solving "toy" versions. 11/12/2013 Computer Vision a hard case for AI 6

7 What do we have today Personal Conclusion He is correct we don t have the general purpose robot. AI Research split into many sub/related areas: Machine Learning, Computer Vision, (more later) In some areas we are doing a very good job: Natural Language Processing (NLP) Playing chess In some areas turned out to be very hard: Robotics Computer Vision seems like one of the hardest ones (a few success stories come later) 11/12/2013 Computer Vision a hard case for AI 7

8 Scene understanding in the 70s [Sussman, Lamport, Guzman 1966] [Slide credits Andrew Blake] 11/12/2013 Computer Vision a hard case for AI 8

9 Scene understanding - today We are getting there 40 years later [Xiao et al. NIPS 2012] 11/12/2013 Computer Vision a hard case for AI 9

10 Today: Topics / Subareas in AI Applications: Natural Language Processing Planning Computer Vision Robotics Biology Human-Computer Interaction Algorithms: Search Discrete Optimization Continuous Optimization Probabilistic Inference Learning Theory: Logic Machine Learning Probability Theory Decision Theory Automated Reasoning Models: Knowledge representation Undirected graphical models Directed Graphical models Unstructured models AI overlaps with many other disciplines There is not one unique, overarching theory AI has impact in many domains [derived from first lecture] 11/12/2013 Computer Vision a hard case for AI 10

11 Today: Topics / Subareas in AI Applications: Natural Language Processing Planning Computer Vision Robotics Biology Human-Computer Interaction Algorithms: Search Discrete Optimization Continuous Optimization Probabilistic Inference Learning Theory: Logic Machine Learning Probability Theory Decision Theory Automated Reasoning Models: Knowledge representation Undirected graphical models Directed Graphical models Unstructured models AI overlaps with many disciplines There is not one unique, overarching theory AI has impact in many domains [derived from first lecture] 11/12/2013 Computer Vision a hard case for AI 11

12 Books for the following lecture Artificial Intelligence: A modern Approach Russell, Norvig (Third Edition, English) (we cover: (parts of) sections: 4,5,6) Pattern recognition and machine learning, Bishop. Springer Learning from data: A short course, Abu-Mostafa, Magdon- Ismail,Hsuan-Tien Lin. AMLbook. Markov Random Fields for Vision and Image Processing, Blake, Kohli, Rother. MIT-Press 2011 Computer Vision: Algorithms and Applications, Szeliski, Springer An earlier version of the book is online: 11/12/2013 Computer Vision a hard case for AI 12

13 Roadmap for this lecture A few more words on history of AI and subareas of AI An introduction to Computer Vision What is it? Why is it hard? How can we solve it? What can we do with it? Roadmap for the remaining lecture 11/12/2013 Computer Vision a hard case for AI 13

14 What is computer Vision? (Potential) Definition: Developing computational models and algorithms to interpret digital images and visual data in order to understand the visual world we live in. 11/12/2013 Computer Vision I: Introduction 14

15 What is computer Vision? (Potential) Definition: Developing computational models and algorithms to interpret digital images and visual data in order to understand the visual world we live in. 11/12/2013 Computer Vision I: Introduction 15

16 What does it mean to understand? Physics-based vision: Geometry Segmentation Camera parameters Emitted light (sun) Surface properties: Reflectance, material (Potential) Definition: Developing computational models and algorithms to interpret digital images and visual data in order to understand the visual world we live in. Semantic-based vision: Objects: class, pose Scene: outdoor, Attributes/Properties: - old-fashioned train - A-on-top-of-B 11/12/2013 Computer Vision I: Introduction 16

17 Image-formation model Very many sources of variability Image [Slide Credits: John Winn, ICML 2008] 11/12/2013 Computer Vision I: Introduction 17

18 Image-formation model Scene type Scene geometry Street scene [Slide Credits: John Winn, ICML 2008] 11/12/2013 Computer Vision I: Introduction 18

19 Image-formation model Scene type Scene geometry Object classes Sky Building 3 Road Street scene Sidewalk Bicycle Tree 3 Car 5 Person 4 Bench Bollard [Slide Credits: John Winn, ICML 2008] 11/12/2013 Computer Vision I: Introduction 19

20 Image-formation model Scene type Scene geometry Object classes Object position Object orientation Sky Building 3 Road Street scene Sidewalk Bicycle Tree 3 Car 5 Person 4 Bench Bollard [Slide Credits: John Winn, ICML 2008] 11/12/2013 Computer Vision I: Introduction 20

21 Image-formation model Scene type Scene geometry Object classes Object position Object orientation Object shape Street scene [Slide Credits: John Winn, ICML 2008] 11/12/2013 Computer Vision I: Introduction 21

22 Image-formation model Scene type Scene geometry Object classes Object position Object orientation Object shape Depth/occlusions [Slide Credits: John Winn, ICML 2008] 11/12/2013 Computer Vision I: Introduction 22

23 Image-formation model Scene type Scene geometry Object classes Object position Object orientation Object shape Depth/occlusions Object appearance [Slide Credits: John Winn, ICML 2008] 11/12/2013 Computer Vision I: Introduction 23

Depth/occlusions Object appearance Illumination Shadows [Slide

24 Image-formation model Scene type Scene geometry Object classes Object position Object orientation Object shape Depth/occlusions Object appearance Illumination Shadows [Slide Credits: John Winn, ICML 2008] 11/12/2013 Computer Vision I: Introduction 24

25 Image-formation model Scene type Scene geometry Object classes Object position Object orientation Object shape Depth/occlusions Object appearance Illumination Shadows [Slide Credits: John Winn, ICML 2008] 11/12/2013 Computer Vision I: Introduction 25

26 Image-formation model Scene type Scene geometry Object classes Object position Object orientation Object shape Depth/occlusions Object appearance Illumination Shadows Motion blur Camera effects [Slide Credits: John Winn, ICML 2008] 11/12/2013 Computer Vision I: Introduction 26

27 Image-formation model Scene type Scene geometry Object classes Object position Object orientation Object shape Depth/occlusions Object appearance Illumination Shadows Motion blur Camera effects [Slide Credits: John Winn, ICML 2008] 11/12/2013 Computer Vision I: Introduction 27

28 The Scene Parsing challenge --- a grand challenge of computer vision (Probabilistic) Script = {Camera, Light, Geometry, Material, Objects, Scene, Attributes, Others} Single image 11/12/2013 Computer Vision I: Introduction 28

29 Why is scene parsing hard? Computer Graphics 3D Rich Representation, Script = {Camera, Light, Geometry, Material, Objects, Scene, Attributes, Others} 2D pixel representation Computer Vision Computer Vision can be seen as inverse graphics 11/12/2013 Computer Vision I: Introduction 29

30 Example of a recent work Scene graph Input [Gupta, Efros, Herbert, ECCV 10] 11/12/2013 Computer Vision I: Introduction 30

31 Example: General Object recognition & segmentation Good results [TextonBoost; Shotton et al, 06] 11/12/2013 Computer Vision I: Introduction 31

32 Example: General Object recognition & segmentation Failure cases [TextonBoost; Shotton et al, 06] 11/12/2013 Computer Vision I: Introduction 32

33 Comparison: CV to NLP Natural Language Processing Real-time Speech translation Amount of input data: (Audiobooks have 2.2 words per second, i.e. ~20 letters per second) Sound is 1D Strong rule (context free grammars exists) Real-time Speech translation exists more or less Computer Vision (Scene Understanding) Amount of Input Data: 10 Mpixel /second for a robot Images are 2D (much harder inference!) Rules/Models are hard to define since images are so varied (see next lecture) Scene Understand is far from being solved, best method has a 47% of being correct for 20 object classes 11/12/2013 Computer Vision a hard case for AI 33

34 Scene Understand is far from being solved, best method has a 47% of being correct for 20 object classes 11/12/2013 Computer Vision a hard case for AI 34

35 What is computer Vision? (Potential) Definition: Developing computational models and algorithms to interpret digital images and visual data in order to understand the visual world we live in. 11/12/2013 Computer Vision I: Introduction 35

36 Visual Data is everywhere Visual Data is dense, structured data Real world: RGB photo/video cameras Mobile phones Depth cameras Laser scanners Robotics Medicine Microscopy Surveillance Cars Web search Physics simulations 11/12/2013 Computer Vision a hard case for AI 36

37 How can we interpret visual data? 2D pixel representation Computer Graphics 3D Rich Representation, Script = {Camera, Light, Geometry, Material, Objects, Scene, Attributes, Others} Computer Vision What general (prior) knowledge of the world (not necessarily visual) can be exploit? What properties / cues from the image can be used? Both aspects are quite well understood (a lot is based on physics) but how to use them is efficiently is open challenged (see later) 11/12/2013 Computer Vision I: Introduction 37

38 How can we interpret visual data? 2D pixel representation Computer Graphics 3D Rich Representation, Script = {Camera, Light, Geometry, Material, Objects, Scene, Attributes, Others} Computer Vision What general (prior) knowledge of the world (not necessarily visual) can be exploit? What properties / cues from the image can be used? Both aspects are quite well understood (a lot is based on physics) but how to use them is efficiently is open challenged (see later) 11/12/2013 Computer Vision I: Introduction 38

39 Prior knowledge (examples) Hard prior knowledge Trains do not fly in the air Objects are connected in 3D Soft prior knowledge: The camera is more likely 1.70m above ground and not 0.1m. Self-similarity: all black pixels belong to the same object 11/12/2013 Computer Vision I: Introduction 39

40 Prior knowledge harder to describe Describe Image Texture Real Image zoom Not a real Image zoom Microscopic Images. What is the true shape of these objects 11/12/2013 Computer Vision I: Introduction 40

41 The importance of Prior knowledge Which patch is brighter: A or B? [Edward Adelson] 11/12/2013 Computer Vision I: Introduction 42

42 The importance of Prior knowledge Which patch is brighter: A or B? [Edward Adelson] 11/12/2013 Computer Vision I: Introduction 43

The importance of Prior knowledge 2D 3D 3D A

local True colors in 3D world True colours In

representation (hard to see for a human) The

43 The importance of Prior knowledge 2D 3D 3D A Ambient Light A Direct Light A B B B 2D Image - local True colors in 3D world True colours In 3D world What the computer sees An unlikely 3D representation (hard to see for a human) The most likely 3D representation This is what humans see implicitly. Ideally the computer sees the sane. 11/12/2013 Computer Vision I: Introduction 44

44 The importance of Prior knowledge Light 2D Image 3D representation Humans see an image not as a set of 2D pixels. They understand an image as a projection of the 3D world we live in Humans have the prior knowledge about the world encoded, such as: Light cast shadows Objects do not fly in the air A car is likely to move but a table is unlikely to move We have to teach the computer this prior knowledge to understand 2D images as picture of the 3D world 11/12/2013 Computer Vision I: Introduction 45

45 Male or Female? 11/12/2013 Computer Vision I: Introduction 46

46 How can we interpret visual data? 2D pixel representation Computer Graphics 3D Rich Representation, Script = {Camera, Light, Geometry, Material, Objects, Scene, Attributes, Others} Computer Vision What general (prior) knowledge of the world (not necessarily visual) can be exploit? What properties / cues from the image can be used? Both aspects are quite well understood (a lot is based on physics) but how to use them is efficiently is open challenged (see later) 11/12/2013 Computer Vision I: Introduction 47

47 Cue: Appearance (Colour, Texture) for object recognition To what object does the patch belong to? 11/12/2013 Computer Vision I: Introduction 48

48 Cue: Outlines (shape) for object recognition 11/12/2013 Computer Vision I: Introduction 49

49 Guess the Object Colour Texture Shape [from JohnWinn ICML 2008] 11/12/2013 Computer Vision I: Introduction 50

50 Cue: Context for object recognition 11/12/2013 Computer Vision I: Introduction 51

51 Cue: Context for object recognition 11/12/2013 Computer Vision I: Introduction 52

52 Cue: stereo vision (2 frames) for geometry estimation Ground truth Algorithmic output 11/12/2013 Computer Vision I: Introduction 53

53 Cue: Multiple Frames for geometry estimation 11/12/2013 Computer Vision I: Introduction 54

54 Cue: Shading & shadows for geometry and Light estimation 11/12/2013 Computer Vision I: Introduction 55

55 Texture gradient for geometry estimation 11/12/2013 Computer Vision I: Introduction 56

56 The Scene Parsing challenge --- a grand challenge of computer vision (Probabilistic) Script = {Camera, Light, Geometry, Material, Objects, Scene, Attributes, Others} Single image Many applications do not have to extract the full probabilistic script but only a subset, e.g. does the image contain a car? many examples to come later 11/12/2013 Computer Vision I: Introduction 57

many application scenarios are in reach To simplify

technology - Moving images - User involvement 2) Rich

get labels (online games, mechanical turk) - Powerful

57 many application scenarios are in reach To simplify the problem: 1) Richer Input: - Modern sensing technology - Moving images - User involvement 2) Rich Data to learn from: - use the web - crowdsourcing to get labels (online games, mechanical turk) - Powerful graphics engines 11/12/2013 Computer Vision I: Introduction 58

58 Real-time pedestrian detection 11/12/2013 Computer Vision I: Introduction 59

59 Animate the world [Chen et al. UIST 12] 11/12/2013 Computer Vision I: Introduction 60

60 Example: Xbox people tracking 11/12/2013 Computer Vision a hard case for AI 62

61 Example: people tracking (test data) 11/12/2013 Computer Vision a hard case for AI 63

large impact in many field: Gaming, Robotics, HCI,

62 Body tracking and Gesture Recognition has many applications StartUp 2012: Try Fashion online Very large impact in many field: Gaming, Robotics, HCI, Medicine, 11/12/2013 Computer Vision I: Introduction 65

63 Start-Up Company: Like.com 11/12/2013 Computer Vision I: Introduction 66

64 What is computer Vision? (Potential) Definition: Developing computational models and algorithms to interpret digital images and visual data in order to understand the visual world we live in. 11/12/2013 Computer Vision I: Introduction 67

65 Example: Image Segmentation output Image with User Input x = 0,1 n Typically n is large 1M θ i (y i ) y i θ ij (y i, y j ) y j Undirected graphical models 11/12/2013 Introducing the Computer Vision Lab Dresden 68

66 Example: Image Segmentation θ i (y i ) y i θ ij (y i, y j ) y j Image with User Input y = 0,1 n Typically n is large 1M Graphical models Modelling: How toformulate the graphcial model, e. g. P y θ (this this is one of many tasks) Inference/Optimization: y = argmax y P(y θ) (this this is one of many tasks) Learning: find optimal parameters θ (this this is one of many tasks) 11/12/2013 Introducing the Computer Vision Lab Dresden 69

67 What is Learning? Training: Image and Ground Truth Probabilistic model: P y θ ) Error Function to say how we compare results find weights θ (can be up to 10M parameters) Testing: Inference: Maximum Probability: y = argmax y P y θ ) 11/12/2013 Introducing the Computer Vision Lab Dresden 70

68 Model versus Inference (Algorithm) Input: Image sequence [Data courtesy from Oliver Woodford] Output: New view Model: Minimize a binary 4-connected undirected graphical model (choose a colour-mode at each pixel) [Fitzgibbon et al. 03] 11/12/2013 Computer Vision I: Introduction 71

Another Example: Model versus Algorithm Ground Truth Graph Cut with Belief Propagation ICM, Simulated QPBOP truncation (approximate solution) Annealing [Boros et al.

69 Another Example: Model versus Algorithm Ground Truth Graph Cut with Belief Propagation ICM, Simulated QPBOP truncation (approximate solution) Annealing [Boros et al. 06; [Rother et al. 05] (approximate solution) Rother et al. 07] (approximate solution) (exact solution) Why is the result not perfect? Model or Inference 11/12/2013 Computer Vision I: Introduction 72

Summary: The key questions for the upcoming lectures What is the modelling language: undirected / directed Graphical models; unstructured models How does the model look like: What is the structure?

70 Summary: The key questions for the upcoming lectures What is the modelling language: undirected / directed Graphical models; unstructured models How does the model look like: What is the structure? How do the functions look like? Can we learn the Model from Data: Learn structure Learn potential functions Probabilistic Learning / Discrimantive Learning How do we optimize the model (perform inference): fast, approximate Marginals Exactly solvable? 11/12/2013 Computer Vision I: Introduction 73

71 Is Machine Learning feasible? We are looking at a mapping: X = 0,1 3 Y = {0,1} We are given 5 training data instances: [example from book: Learning from data; Abu-Mustafa et al.] 11/12/2013 Computer Vision a hard case for AI 74

72 Is Machine Learning feasible? We are looking at a mapping: X = 0,1 3 Y = {0,1} We are given 5 training data instances:??? What is the value for the remaining 3 data points? [example from book: Learning from data; Abu-Mustafa et al.] 11/12/2013 Computer Vision a hard case for AI 75

73 Is Machine Learning feasible? Let us look at all possible functions: f x 1, x 2, x 3 = y We have in total 2 23 = 256 possible functions Given the training data fixed we have 8 remaining functions:??? Without any information about f any solution for f is good! We need information about f [example from book: Learning from data; Abu-Mustafa et al.] 11/12/2013 Computer Vision a hard case for AI 76

74 Is Machine Learning feasible? x 3 x 3 x 2 x 2 Assume f is smooth in 3D space (x 1, x 2, x 3 ), i.e. few 0-1 transitions in Manhattan-space (neighborhood drawn by lines) x 1 x 1 6 Transitions (optimal) x 3 x 3 x 2 x 2 9 Transitions (less good) x 1 12 Transitions (worst) [example from book: Learning from data; Abu-Mustafa et al.] 11/12/2013 Computer Vision a hard case for AI 77 x 1

75 Roadmap for this lecture A few more words on history of AI and subareas of AI An introduction to Computer Vision What is it? Why is it hard? How can we solve it? What can we do with it? Roadmap for the remaining lecture 11/12/2013 Computer Vision a hard case for AI 78

76 Roadmap for next lectures (1): Computer Vision a hard case for AI (2): Introduction to probability theory (1): Exercise: probability theory (2): Unstructured models: Decision theory 8.1 (1): Unstructured models: Probabilistic Learning 8.1 (2): Unstructured models: Discriminative Learning Intro 15.1 (1): Exercise: Learning 15.1 (2): Unstructured models: Discriminative Learning Lecturers: Carsten Rother and Dimitri Schlesinger 11/12/2013 Computer Vision a hard case for AI 79

77 Roadmap for next lectures 22.1 (1): Undirected Graphical models: Models and Inference 22.1 (2): Undirected Graphical models: Models and Inference 29.1 (1): Exercise: Learning 29.1 (2): Undirected Graphical models: Learning 5.2 (1): Directed Graphical models 5.2 (2): Wrap up; Putting theory to practice Lecturers: Carsten Rother and Dimitri Schlesinger 11/12/2013 Computer Vision a hard case for AI 80

78 Related Lectures in Master / Bachelor / Diploma Computer Vision 1: Algorithms and Applications (winter term; 2+2) Machine Learning (winter term; 2+2) Computer Vision 2: Models, Inference, and Learning (summer term; 4+2) Many seminars and practical sessions Topics for Bachelor, Master, Diploma Thesis 11/12/2013 Computer Vision a hard case for AI 81

Computer Vision I - Introduction

Computer Vision I - Introduction Carsten Rother 21/10/2014 Computer Vision I:Introduction Computer Vision I: Introduction 21/10/2014 2 Admin Stuff Language: German/English; Slides: English (all the terminology