What does it mean to see?

Size: px

Start display at page:

Download "What does it mean to see?"

Doreen Lambert
5 years ago
Views:

1 What does it mean to see? The engineering science of computer vision James L. Crowley Professor, I.N.P. Grenoble Projet PRIMA - Laboratory GRAVIR INRIA Rhône Alpes Grenoble, France 1

2 The Science of Computer Vision Computer Vision is a branch of engineering science (Simon 69) that has developed through a series of paradigms. Science: The elaboration of theories and models that explain and predict. Science is a method of investigation performed by a collection of scientists (a scientific community) who share a set of paradigms. Paradigm: The problems and problem solutions adopted by a scientific community. T. S. Kuhn, The Structure of Scientific Revolutions, The Univ. of Chicago Press, Chicago, H. A. Simon, The Sciences of the Artificial, The MIT Press, Cambridge Mass.,

3 The Science of Computer Vision Computer vision is the science of machines that see. What does it mean for a "machine" to "see"? What are the paradigms of Computer Vision? 3

4 Paradigms for Computer Vision Early Paradigms Blocks World and Scene Analysis ( ) Symbolic Artificial Intelligence ( ) Established Paradigms 3D Reconstruction Active Vision Emerging paradigms Software Architectures for Vision Systems Appearance Based Vision Statistical Learning 4

5 What does it mean to see? Outline: The Science of Computer Vision Early Paradigms Geometric Foundations Active Vision Physics Based Vision: Color Models Real Time Tracking Systems Appearance Based Vision Possible Future Paradigms and Conclusions 5

6 What does it mean to see? Outline: The Science of Computer Vision Early Paradigms Geometric Foundations Active Vision Physics Based Vision: Color Models Real Time Tracking Systems Appearance Based Vision Possible Future Paradigms and Conclusions 6

7 The Blocks World Examples: The Thesis of L. Roberts (Stanford 1963) Pattern Recognition and Scene Analysis, (Duda and Hart, 1972) Line Labeling (Waltz, Huffman, etc) Approach: 1) Structural recognition of polyhedral objects 2) Edge Detection (Roberts, Sobel, etc) 3) Homogeneous Coordinates 4) Wire Frame 3-D Object Representation 7

8 The Blocks World Debates: Hypotheses: Failures: Edge Detection vs Segmentation Structural vs Syntactic vs Statistical Pattern Recognition The blocks world assumed static rigid planar surfaces with Lambertian albedo and diffuse illumination. The real world is composed of dynamic deformable non-planar objects with arbitrary albedo under arbitrary and varying illumination. Techniques from the blocks world were too fragile to be useful. 8

9 Symbolic Reasoning for Artificial Intelligence Dinner Table Contains Contains Plate Next To Problems: 1) Representing object knowledge 2) Representing knowledge of geometry and image formation 3) Inference Techniques Problem Solutions 1) Frames and Rules 2) Production Systems, Theorem Proving 3) Prediction and Verification (Context) fork 9

10 Symbolic Reasoning for Artificial Intelligence Dinner Table Contains Contains Debates: Plate Next To fork Top down or bottom up. Structural or symbolic models How to learn models Failure: Unstable image representations Computational Complexity Knowledge Acquisition 10

11 Symbolic Reasoning for Artificial Intelligence Dinner Table Contains Contains Plate fork examples: Frames (Minsky, 1975) Next To Interpretation Guided Segmentation (Barrow and Tennenbaum, 77) Visions (Hanson and Riseman, 1978) Rule based interpretation (Mckeown 1985) Schema System (Draper 87) 11

12 The Marr Paradigm D. Marr Vision, W. H. Freeman, San Francisco, (Plus other books by Grimson, Hildreth, Ullmann, etc) Concepts: Three layered architecture composed of Primal Sketch, 2 1/2 D Sketch, and Hierarchical Object Models. Inspiration from neuroscience. 12

13 The Marr Paradigm Problems: 1) Representing the Primal Sketch 2) Computing the 2 1/2 D Sketch 3) Interpolation 4) Object Model Representation Problem Solutions 1) Laplacian Pyramid 2) Zero Crossing Contours in the Laplacian of Gaussian 3) Shape from "X" 4) Generalized Cylinders 13

14 Vision as Reconstruction Debates: 1) Object or Camera centered reference frame? 2) Biological relevance of Computational Models 3) Fusion and Integration of Shape from "X". Marr insisted that the invariant in vision is the 3D world. This has led to 15 years of research on vision as exact 3-D reconstruction from multiple images. 14

15 What does it mean to see? Outline: The Science of Computer Vision Early Paradigms Geometric Foundations Active Vision Physics Based Vision: Color Models Real Time Tracking Systems Appearance Based Vision Possible Future Paradigms and Conclusions 15

16 Homogeneous Coordinates Image Point r P = x y 1 Image Line r L = a b c ( ) Scene Point r Q = x y z 1 Plane r S = a b c d ( ) Line Equation r L T r P = ax +by+c = 0 Plane Equation r S T r Q =ax +by+cz+d=0 16

17 Projective Camera Model Point pinhole Retina M s i = m 11 m 12 m 13 m 14 m 21 m 22 m 23 m 24 m 31 m 32 m 33 m 34 w r P i = M s i r Q s or wi wj w = x m 11 m 12 m 13 m 14 y m 21 m 22 m 23 m 24 z m 31 m 32 m 33 m

18 Homographic Projection Bijective projection from one plane to another a b H a b = h 11 h 12 h 13 h 21 h 22 h 23 h 31 h 32 h 33 w r P b = H a b r Q a or wx a wy a w = h 11 h 12 h 13 x b h 21 h 22 h 23 y b h 31 h 32 h

19 Fundamental Matrix R s Points in an image project to a line in a second image p a e a Base Line eb q b F AB = f 11 f 12 f 13 f 21 f 22 f 23 f 31 f 32 f 33 A s Image A r r L B = F AB Q A or image B B s a b = c f 11 f 12 f 13 x a f 21 f 22 f 23 y a f 31 f 32 f

20 Trifocal Tensor C s m c l c r R c = T ab c r P a r Q b e c r c Image C f c p a R s q b A s B s Image A Image B Geometric relation between points in three images Correspondence of 29 points in three images gives transformation for all other points 20

21 Multi-camera Geometry References : R. I. Hartley and A. Zisserman, Multiple View Geometry in Computer Vision, Cambridge University Press, O. Faugeras, Three-dimensional Computer Vision: A Geometric Viewpoint, MIT Press,

22 What does it mean to see? Outline: The Science of Computer Vision Early Paradigms Geometric Foundations Active Vision Physics Based Vision: Color Models Real Time Tracking Systems Appearance Based Vision Possible Future Paradigms and Conclusions 22

23 Active Vision Approach: Use control of cameras to simplify observations of the external world. The observations are defined by the requirements of a task. An active vision system acts as a "FILTER" for information. Problems: 1) Integration : a) of processing modes b) with (robotics) applications 2) Robust, real time, image processing 3) Control of fixation and attention 4) Control of processing. 23

24 Active Vision Control of sensors and processing to make observations about the external world. Real Time: An active vision system must return its results within a fixed delay. Attention is limited: Fixed delay response requires limiting the data. This means restricting processing to a small region of interest. An active vision system acts as a FILTER for information. 24

25 Active Vision Continuous operation: The system is always running. The results at each instant provide the context for the next instant. Processing and parameters are determined "on the fly" The vision system is a series of filters : Fixation: The fixation point and the horopter Color: The color filters ROI: The region of interest Description: Receptive fields Attention: Context from task and activity 25

26 The LIFIA Camera Head (1991) 26

27 Platform for Visual Navigation Supervisor M a i l b o x Device Controllers VisionClips Processes Image Processing Silicon Graphics Fixation Camera Controller Navigation Sun Vehicle Controller Albatros 27

28 LIFIA Binocular Head 28

29 Why Fixate? 1) Fixation makes real time processing possible 2) Fixation cancels motion blur. 3) Fixation separates figure from ground. Challenge: Real time Control. Multi-cue Integration 29

30 The Human Eye Nerf Optique Fovéa (cônes) Périphéríe (bâtonnets) Fovéa Cornéa The retina is composed of the fovea and the periphery. Retina provides precise acuity for recognition. The periphery guides fixation and triggers reflexes. 30

31 The Horopter Horopter : The locus of points in the scene which map to (nearly) the same location in the two images. = +σ Point de Fixation = 0 The Horopter is a FILTER for information. = σ The Horopter permits a simple separation of figure and ground. B C D B C D 31

32 KTH Binocular Head (1992) 32

33 KTH Binocular Head (1992) 33

34 Vergence and Version: The Vief Muller Circle Zero Disparity Surface : h l = h r Fixation Point F P Interest point 2 µ Stereo Fusion Reflex: Map image points to the same image position Primary Line of Sight η l α l α r η r Corresponding Visual Rays (Much easier than Stereo Matching) Left Retina Right Retina Fixation Center 34

35 Vergence and Version: The Vief Muller Circle Z Z Z 2 η 2 η η η α l α r α l α r α l α r x α c x α c X Symmetric Vergence Vergence Version 35

36 KTH Binocular Head (1992) 36

37 Active Vision References: J.Y. Aloimonos, I. Weiss and A. Bandopadhay, "Active Vision", International Journal on Computer Vision, pp , R. Bajcsy, "Active Perception", IEEE Proceedings, Vol 76, No 8, pp , August Ballard, D.H. and Ozcandarli, A., Eye Fixation and Early Vision: Kinematic Depth, IEEE 2nd Intl. Conf. on Comp. Vision, Tarpon Springs, Fla., pp , Dec J. L. Crowley and H. I Christensen,Vision as Process, Springer Verlag,

38 What does it mean to see? Outline: The Science of Computer Vision Early Paradigms Geometric Foundations Active Vision Physics Based Vision: Color Models Real Time Tracking Systems Appearance Based Vision Possible Future Paradigms and Conclusions 39

39 Albedo: Reflectance Functions Reflectance Functions describe the interaction of light with matter. Lumiere R L (i, e, g, λ) = Number of photons emitted Number of photons received N i g Camèra e Lambertian Reflection (ex: Paper, snow) Specular Reflection (ex: Mirror) R L (i, λ)= P(λ)cos(i) R S (i, e, g, λ) = 1 if i = e and i +e=g 0 otherwise 40

40 Albedo: Reflectance Functions Arbitrary reflectance functions can be modeled as a weighted sum of Lambertian and Specular reflection. Composant Speculaire Composant Lambertian Lumieres R(i, e, g, λ) = c R S (i, e, g, λ) + (1-c) R L (i, λ) Surface Pigment Lambertian Reflection (ex: Paper, snow) Specular Reflection (ex: Mirror) R L (i, λ)= P(λ)cos(i) R S (i, e, g, λ) = 1 if i = e and i +e=g 0 otherwise 41

41 Luminance and Chrominance Lambertian reflection can be decomposed into luminance and chrominance Composant Speculaire Lumieres Lambertian Reflection: R L (i, λ) = P(λ)cos(i) Composant Lambertian Surface Pigment Luminance is determined by surface orientation (describes 3D shape) Chrominance identifies object pigments (signature for object recognition) 42

42 Color Perception Nerf Optique Fovéa (cônes) Périphéríe (bâtonnets) Fovéa Cornéa Day Vision: High Acuity, chromatic Three pigments: cyanolabe (445 nm), chlorolabe (535 nm), and erythrolabe (570 nm) Night Vision: Achromatic Single pigment - rhodopsine (510 nm) 43

43 Color Perception 1 Reponse Relative β γ 0.5 α nm λ Day Vision: High Acuity, chromatic Three pigments: cyanolabe (445 nm), chlorolabe (535 nm), and erythrolabe (570 nm) Color percetion is subjective, based on relative logarithmic responses of three color channels 44

44 Hue Luminance and Saturation HLS Color Space. Luminance: Relative Intensity (Black-white) Hue is angle on a plane Saturation is radial distance Plan Chromatique L Axe de Luminance S T 45

45 Color Perception 0.3 bleu λ b( ) vert rouge λ r( ) 0.2 g( λ ) nm λ Color cameras imitate RGB with three filters. RGB 46

46 Color Spaces: RGB magenta B bleu R rouge blanc noir cyan jaune vert Axe Achromatique Triangle de Maxwell [R + V + B = 1] V Plan Chromatique "complémentaire" [R + V + B = 2] RGB Color Cube Considers each filter as an orthogonal sample of the spectrum 47

47 Example : Color Skin detection Region of Interest Sample rate Color Table Color Image Skin Color Detection Average probability Execution time Skin Probability Transform RGB pixels in to probability of skin Theory: Bayes rule Implementation: table lookup 48

48 Probabilistic Detection of Skin Chrominance: r = R R + G + B g = G R + G + B Probability of all colors Probability of skin p(r,g) 1 N Tot h Tot (r,g) p(r,g skin) 1 N skin h skin (r,g) p(skin r,g) = p(r,g skin)p(skin) p(r,g) h skin(r,g) h Tot (r,g) = h ratio (r,g) 49

49 detecting skin pixels with color Sample Rate Color Table Average Probability Execution Time Skin Color Detector Using Chrominance to detect hands 50

50 What does it mean to see? Outline: The Science of Computer Vision Early Paradigms Geometric Foundations Active Vision Physics Based Vision: Color Models Real Time Tracking Systems Appearance Based Vision Possible Future Paradigms and Conclusions 51

51 Blob grouping ROI Blob Detected Probability Image Moment Based Grouping (Blob, ID, CF,x, y, s x, s y, θ) Blob: A connected region of detected pixels Properties: Position and spatial extent Theory: Moments 52

52 Blob grouping Confidence S = i max i= i min j max j= j min p skin (i,j) Position i max µ i = 1 S i=imin j max p skin (i,j) i i max µ j = 1 S i= i min j max p skin (i,j) j j= j min j= j min i max σ i 2 = 1 S i= imin j max p skin (i, j) (i µ i ) 2 j=j min Spatial Extent i max σ j 2 = 1 S i= imin i max σ ij = 1 S i= i min j max (i,j) (j µ j ) 2 j= j min j max p skin p skin (i, j) (i µ i ) (j µ j ) j= j min 53

53 54

54 Tracking: Recursive Estimation Y t,c y,cf Match X ˆ t, C ˆ t,cf Estimation Observation Update Detect Predict Region of Interest * X t + t *,C t + t,cf Tracking: 1) Optimise processing by focus of attention 2) Maintain target and ID properties across time. 55

55 Robust Tracking Multiply pixels with Gaussian function based on detection in previous image. f skin (i, j) := f skin (i, j) e 1 i µ i 2 j µj T i (kc) 1 µ i j µj p(skin t µ t- t, C t- t ) 56

56 57

57 Multi-Cue Face Tracking Supervisor Blink Detection Eye Detector (correlation) Face Detector (color) Blink Detection: Precise but infrequent Correlation: Fast and Precise but fragile Probabilistic Chrominance: Slower and less precise, but reliable. Approach: coordinate multipe redundant detection processes J. L. Crowley and F. Berard, "Multi-Modal Tracking of Faces for Video Communications", IEEE Conference on Computer Vision and Pattern Recognition, CVPR '97, St. Juan, Puerto Rico, June

58 Multi-Cue Face Tracking 59

59 Multi-Cue Face Tracking Supervisor Image Acquisition and Processing Blink Detection ColorDetection Correlation Tracking Camera Control Interpreter Blink SSD Color

60 Blue Eye Video Entity Detection and Tracking Process Corba Shell Time Detection Prediction Video Stream Video Demon ObservationMo ObservationMo dules Observation dules Modules Estimation Entities Event Detection Events Hardwired Control in C++ Communication using CORBA Observation Modules: Color Histogram Ratio, Background Difference, Motion History Image 61

61 PETS Benchmark #2 62

62 Blue Eye Video Activity Sensor (PETS 2002 Data) 63

63 Blue Eye Video Activity Sensor (Intersection Event Observation) 64

64 CAVIAR Indoor Test-bed: INRIA Entrance Hall 2 Cameras: one w/wide angle lens, one steerable pan-tilt-zoom 65

65 66

66 Left-behind Backage Detection 67

67 CAVIAR Outdoor Test Bed INRIA Back Parking Lot 2 Outdoor Surveilance Platforms, 3 m separation, 3 meter height 68

68 Back Parking Lot Behaviour Analysis 69

69 Tracking, Recognition and Attention Lesson: Detect, Track then Recognize. Tracking focusses attention for recognition Tracking: 1) Conserves identity 2) Focusses Processing Resources 3) Provides Robustness to noise 4) Permits Temporal Fusion 70

70 What does it mean to See? Outline: The Science of Computer Vision Early Paradigms Geometric Foundations Active Vision Real Time Tracking Systems Software Architectures Possible Future Paradigms Conclusions 71

71 Supervised Perceptual Process Events Configuration Requests for state Events Current State Response to commands Autonomic Supervisor Time ROI, S, Detection Method Detection Prediction ROI, S, Detection Method Video Stream ObservationMo ObservationMo dules Observation dules Modules Estimation Entities Intepretation Actors Supervisor Provides: Execution Scheduler Parameter Regulator Command Interpreter Description of State and Capabilities 72

72 Supervised Perceptual Process Process Phases: While True Do Acquire next image Calculate ROI for targets Verify and update targets Detect new targets Regulate module parameters Interpret entities Process messages 73

73 Autonomic Properties provided by process supervisor Auto-regulatory: The process controller can adapt parameters to maintain a desired process state. Auto-descriptive: The process controller provides descriptions of the capabilities and the current state of the process. Auto-critical: Process estimates confidence for all properties and events. 74

74 Categories of Processes Control in State and Capabilities Data Stream Control Entity detection and tracking Entities Events Entity Tracking Processes Input: Data: Sensor stream(video, Acoustic, tactile) Output: Data: List of entities with properties Events: Detection, Loss, Entry in Region, Exit 75

75 Categories of Processes Control in State and Capabilities Control E 1 E m Relation Observation Relation(E 1,, E m ) Relation Observation Processes Input: Data: Entities with properties Output: Data: List of relations Events: Detection or Loss of a relation 76

76 Categories of Processes Control in State and Capabilities E 1 E m Control Composition Observation Events Composite Objects Composition Observation Processes Input: Data: Entities with properties Output: Data: List of Composite Objects with CF Events: Detection or loss of a composite object 77

77 Example: Hand and Face Observer FaceAndHand Observer Control Control Events Video Entity Detection and Tracking E 1 E m Entity Composition Torso, Face, Hands Composed of Entity Tracker: Background difference and color Entity Grouper: Assigns roles to blobs 78

78 Example: Hand and Face Observer 79

79 Supervised Perceptual Process Events Configuration Requests for state Events Current State Response to commands Autonomic Supervisor Time ROI, S, Detection Method Detection Prediction ROI, S, Detection Method Video Stream ObservationMo ObservationMo dules Observation dules Modules Estimation Entities Intepretation Actors Observation Modules: Color Histogram Ratio Local Appearance Background Difference Motion History Image Local Appearance is described by Receptive Fields 96

80 What does it mean to see? Outline: The Science of Computer Vision Early Paradigms Geometric Foundations Active Vision Physics Based Vision: Color Models Real Time Tracking Systems Appearance Based Vision Possible Future Paradigms and Conclusions 97

81 Appearance The set of all possible images of an object, scene or event. 98

82 The Appearance Manifold The pixels of an image define a vector. The space of all images of an object is a manifold (the Appearance Manifold). The dimensions of the appearance manifold are the parameters of image formation problem: representing the appearance manifold 99

83 Plenoptic Dimensions Plenoptic function: A(x, y,, R, s, ) x, y - image position ϕ, γ - Lattitude and Longitude (2) R - Radius of view sphere (1) s - image scale factor (1) θ - Image plane rotation (1) Λ - Illumination (2 or more) 100

84 Sampled Plenoptic Dimensions Plenoptic image function: A(i, j, m, n,, s) i, j, Image Coordinates m, n - Lattitude and Longitude s - image scale factor (includes R, view sphere radius) θ - Image plane rotation Λ - Assume constant Illumination 101

85 Receptive Field Manifolds r M m,n (i,j) =< A(i,j,m,n), r ϑ (i,j;σ,θ) > ϑ k : M k : Vector or receptive fields Vector of local features for indexation and recognition. 102

86 Receptive Field Manifolds r M m,n (i,j) =< A(i,j,m,n), r ϑ (i,j;σ,θ) > Problem: Define i,j the receptive field functions 103

87 Chromatic Appearance L C 1 C 2 = R G B Transform color images (R, G, B) (L, C 1, C 2 ) 105

88 Chromatic receptive fields Puminance: information about object shape Chrominance: signature for recognition Luminance r - g r + g - b chrominance 106

89 Gaussian RF s are Steerable in Orientation Intrinsic orientation θ i (i,j) = Tan 1 ( < G y A(i,j) > < G x A(i,j) > ) Receptive field response at intrinsic orientation. < G x θ A(i, j) > = < G x A(i,j) > Cos(θ)+ < G y A(i,j) >Sin(θ) 107

90 Gaussian RF s are Steerable in Scale The intrinsic scale is an invariant for appearance σ (i, j) = Arg Max{< 2 G(σ) A(i,j) >} i σ 108

91 Probabilistic Recognition using Receptive Field Histograms Feature Vector: Probability of local appearance Probability of appearance given class C v k (x,y) =< ϕ k,a(x,y) > p( r v ) 1 N Tot h Tot ( r v ) p( r v C) 1 N C h C ( r v ) Probabilty of class C p(c) N C N Tot Probability of class C given appearance p(c r v ) = p(r v C)p(C) p( r v ) h (r C v ) h Tot ( r v ) 109

92 View Invariant Person Detection 110

93 View Invariant Person Detection 111

94 Local Appearance Manifold r V (i, j)=< A(i,j), r ϑ > A region of an image is a surface in receptive field space. The set of images of an object is a manifold Position in receptive field space allows pixel level matching 112

95 Recognition by prediction-verification Establish correspondance with most salient point. Propagate correspondance to neighboring pixels. 113

96 Local Appearance 114

97 View Invariant Person Recognition 115

98 What does it mean to see? Outline: The Science of Computer Vision Early Paradigms Geometric Foundations Active Vision Physics Based Vision: Color Models Real Time Tracking Systems Appearance Based Vision Possible Future Paradigms and Conclusions 116

99 Lessons from Biology Seeing is reaction to visual stimulus. Within verterbrates Visual Perception is organised as a series of associations mediated by filters Fixation is mediated in the superior colliculus Attention is mediated by a range of structures from the brainstem through the visual cortex to the frontal cortex. Vision develops as part of a sensor-moteur system whose primary purpose is homeostasis (internal and external). Biological vision is very specific to the structure of the organism and the nature of the environment. 117

100 Possible Future Paradigms Biologically Inspired Vision Insect Vision (Franceschini et al) Vertebrate Vision (Imitate human visual architecture) (cf Current IST-FET Call for Proposal) Ambient Perception (part of ambient intelligence) Adhoc network of large numbers of embedded devices with communication, sensing, display and computing 118

101 Conclusions: What does it mean to see? Human vision is the reaction to visual stimuli Visual skills formed by experience Visual reactions are mediated by experience and goals Human vision is part of sensori-motor and socialogical interaction. The engineering science of machine vision requires Foundations from geometry and signal analysis Techniques for learning visual skills Software engineering techniques for integration and control A theory of vision systems. The field is evolving rapidly, but we have far to go. 119

102 What does it mean to see? Outline: The Science of Computer Vision Early Paradigms Geometric Foundations Active Vision Physics Based Vision: Color Models Real Time Tracking Systems Appearance Based Vision Possible Future Paradigms and Conclusions 120

103 What does it mean to see? The engineering science of computer vision James L. Crowley Professor, I.N.P. Grenoble Projet PRIMA - Laboratory GRAVIR INRIA Rhône Alpes Grenoble, France 121

104 The Conferences and Journals of Computer Vision Journals: IJCV: International Journal of Computer Vision PAMI: IEEE Transactions on Pattern Analysis and Machine Intelligence IVC: Image and Vision Computing Conferences ICCV: International Conference on Computer Vision ECCV: European Conference on Computer Vision CVPR: IEEE (North American) Conferences on Computer Vision ICVS: International Conference on Vision Systems 122

Image Analysis and Formation (Formation et Analyse d'images)

Image Analysis and Formation (Formation et Analyse d'images) James L. Crowley ENSIMAG 3 - MMIS Option MIRV First Semester 2010/2011 Lesson 4 19 Oct 2010 Lesson Outline: 1 The Physics of Light...2 1.1 Photons