To Conclude: Vision in terms of Neurophysiology Receptive fields Left/right hemisphere Visual pathway, packing problem, columns, complementary features Cognitive psychology Perceptual grouping Bottom-up vs. top-down processes Optical illusions Hemispheres, motion perception, Information processing The Marr Paradigm 1
David Marr [1945-1980] Vision [1982] VISION A Computational Investigation into the Human Representation and Processing of Visual Information Freeman Co., 1982 NEUROBIOLOGY 2
David Marr Vision Institute of Electrical Measurement and Measurement Signal Processing 3
David Marr Vision Institute of Electrical Measurement and Measurement Signal Processing 4
David Marr Vision What does it mean, to see? The plain man s answer (and Aristotle s, too) would be, to know what is where by looking. In other words, vision is the process of discovering from images what is present in the world, and where it is. (p.3, 1 st paragraph of General Introduction) Vision Image Understanding: To know what is where. D. Marr 3D 2D, Reconstruction vs. recognition Video Understanding: What is where and when? (borrowed from D. Marr) in space and time 4D 3D 5
Marr Vision: Emphasis on Reconstruction 6
Marr Vision: Emphasis on Reconstruction 7
The Marr Paradigm Computational Framework Stone (Vision and Brain, 2012): computational framework suggests: Vision works like a computer. Better: informational framework because Marr was keen to emphasize the nature of information being processed without necessarily referring to the particular machinery (e.g., neurons or chips) 8
The Marr Paradigm Analogy with Flying Marr (p.27): Importance of Computational Theory an algorithm is likely to be understood more readily by understanding the nature of the problem being solved than by examining the mechanism (and the hardware) in which it is embodied. trying to understand perception by studying only neurons is like trying to understand bird flight by studying only feathers: It just cannot be done. First understand aerodynamics, then think about structures of feathers, shape of wings etc. Wright brothers 1902 (from [Stone, 2012]) 9
The Marr Paradigm Computational Framework 3D surface shape by finding surface normals from shading information SfS 0 90 180 Greylevels Surface normals Neurons A single CPU Multicore CPUs GPUs frogs passing cupcakes. [Stone] 10
David Marr Vision Representational Framework Primal sketch 2-1/2D sketch 3D model 11
David Marr Vision Representational Framework Primal sketch 2-1/2D sketch viewer centered 3D model object centered 12
Marr Primal Sketch saliency! Compare today s interest point, line, edge detection, etc. Raw primal sketch full primal sketch (includes grouping) 13
Marr 2-1/2D Sketch Surface patches (surface normals), depth discontinuitites 14
Marr 3D Model Representation Generalized cylinder, generalized cone 3D hierarchical models 15
David Marr Vision Representational Framework Primal sketch 2-1/2D sketch 3D model 16
Defining the Terms Image Understanding + Video Understanding Image Understanding Processing Image Computer Graphics Scene description 17
Please describe this scene: Scene Description Many possible (+correct!) descriptions Correct/best description may depend on the particular goal(s) purposive, qualitative, active vision [Aloimonos, 1992] 18
My Model of Image Understanding [Pinz, 1994] Repräsentationen Prozesse Datenfluss Kontrollfluss 19
My Model of Image Understanding 20
Up to WS 2014/15: - Mostly 2D - Image understanding Institute of Electrical Measurement and Measurement Signal Processing My Model of Image Understanding This course: - Can this be extended towards video understanding? KU: 2D image and scene description 21
2D Scene Description houses [Matsuyama 90] face [Brunelli 92] pedestrians [Suzuki 90] 2D image objects tokens 22
2D (+time!) Video Description Fast object segmentation in unconstrained video [Papazoglou&Ferrari, ICCV 13] http://groups.inf.ed.ac.uk/calvin/fastvideosegmentation/ 23
2D (+time!) Video Description Fast object segmentation in unconstrained video [Papazoglou&Ferrari, ICCV 13] 24
3D Scene Description Scenecoordinate system S Object 1 Object 2 25
3D (+time!) Video Description 26
More Definition: Visual Recognition [Perona 09] The holy grail of Computer Vision Five tasks of visual recognition : Verification (is a car in the image?) Detection and localization (what is there? where?) Classification (n beach images, m city images) Naming (name and locate all objects in an image) Description: objects, actions, relations, etc. (example kissing scene understanding ) Increasing complexity from top bottom Complexity Image and Video Understanding: mostly 2D (+time) recognition Image-based Measurement: 3D (+time) reconstruction 27
2D Scene Representation and Description You can get very far in 2D! image segmentation image description 2D image object token tokenset 2D grouping 2D scene description 28