EXAM SOLUTIONS. Computer Vision Course 2D1420 Thursday, 11 th of march 2003,

Similar documents
EXAM SOLUTIONS. Image Processing and Computer Vision Course 2D1421 Monday, 13 th of March 2006,

DD2423 Image Analysis and Computer Vision IMAGE FORMATION. Computational Vision and Active Perception School of Computer Science and Communication

ELEC Dr Reji Mathew Electrical Engineering UNSW

1 (5 max) 2 (10 max) 3 (20 max) 4 (30 max) 5 (10 max) 6 (15 extra max) total (75 max + 15 extra)

Final Exam Study Guide

Chapter 3 Image Registration. Chapter 3 Image Registration

Fundamentals of Digital Image Processing

Understanding Variability

Multiple View Geometry

Computer Vision 2. SS 18 Dr. Benjamin Guthier Professur für Bildverarbeitung. Computer Vision 2 Dr. Benjamin Guthier

Range Imaging Through Triangulation. Range Imaging Through Triangulation. Range Imaging Through Triangulation. Range Imaging Through Triangulation

CS223b Midterm Exam, Computer Vision. Monday February 25th, Winter 2008, Prof. Jana Kosecka

COSC579: Scene Geometry. Jeremy Bolton, PhD Assistant Teaching Professor

Introduction to Computer Vision. Introduction CMPSCI 591A/691A CMPSCI 570/670. Image Formation

Computer Vision I - Basics of Image Processing Part 1

Lecture 6: Edge Detection

Capturing, Modeling, Rendering 3D Structures

Midterm Exam Solutions

Fundamentals of Stereo Vision Michael Bleyer LVA Stereo Vision

Stereo Vision. MAN-522 Computer Vision

CS201 Computer Vision Lect 4 - Image Formation

Laser sensors. Transmitter. Receiver. Basilio Bona ROBOTICA 03CFIOR

Final Review CMSC 733 Fall 2014

EE795: Computer Vision and Intelligent Systems

Correspondence and Stereopsis. Original notes by W. Correa. Figures from [Forsyth & Ponce] and [Trucco & Verri]

Motion Analysis. Motion analysis. Now we will talk about. Differential Motion Analysis. Motion analysis. Difference Pictures

Digital Image Processing COSC 6380/4393

Stereo imaging ideal geometry

All good things must...

Introduction to Computer Vision. Human Eye Sampling

Representing the World

2: Image Display and Digital Images. EE547 Computer Vision: Lecture Slides. 2: Digital Images. 1. Introduction: EE547 Computer Vision

Visual Recognition: Image Formation

Homogeneous Coordinates. Lecture18: Camera Models. Representation of Line and Point in 2D. Cross Product. Overall scaling is NOT important.

Camera Calibration. Schedule. Jesus J Caban. Note: You have until next Monday to let me know. ! Today:! Camera calibration

Midterm Examination CS 534: Computational Photography

Comparison between Motion Analysis and Stereo

Computer Vision cmput 428/615

CSE 252B: Computer Vision II

Range Sensors (time of flight) (1)

CS4733 Class Notes, Computer Vision

Requirements for region detection

Module 5: Video Modeling Lecture 28: Illumination model. The Lecture Contains: Diffuse and Specular Reflection. Objectives_template

CS6670: Computer Vision

Lecture 16: Computer Vision

Lecture 16: Computer Vision

Color and Shading. Color. Shapiro and Stockman, Chapter 6. Color and Machine Vision. Color and Perception

Digital Image Processing. Prof. P.K. Biswas. Department of Electronics & Electrical Communication Engineering

Lecture 22: Basic Image Formation CAP 5415

Schedule for Rest of Semester

Final Exam Study Guide CSE/EE 486 Fall 2007

Lecture 8 Object Descriptors

Dense 3D Reconstruction. Christiano Gava

Dense 3D Reconstruction. Christiano Gava

Application questions. Theoretical questions

Computer Vision. Coordinates. Prof. Flávio Cardeal DECOM / CEFET- MG.

Depth. Common Classification Tasks. Example: AlexNet. Another Example: Inception. Another Example: Inception. Depth

Computer Vision I - Filtering and Feature detection

(Sample) Final Exam with brief answers

Last update: May 4, Vision. CMSC 421: Chapter 24. CMSC 421: Chapter 24 1

Region-based Segmentation

CS231A Midterm Review. Friday 5/6/2016

Image processing and features

Digital Image Processing

Computer Vision I - Appearance-based Matching and Projective Geometry

Stereo CSE 576. Ali Farhadi. Several slides from Larry Zitnick and Steve Seitz

An introduction to 3D image reconstruction and understanding concepts and ideas

Colour Reading: Chapter 6. Black body radiators

CS6670: Computer Vision

Lecture 14: Basic Multi-View Geometry

Generalized Hough Transform, line fitting

Lecture: Edge Detection

Machine vision. Summary # 11: Stereo vision and epipolar geometry. u l = λx. v l = λy

Visual Odometry. Features, Tracking, Essential Matrix, and RANSAC. Stephan Weiss Computer Vision Group NASA-JPL / CalTech

Computer Vision I - Basics of Image Processing Part 2

Digital Image Processing COSC 6380/4393

10/5/09 1. d = 2. Range Sensors (time of flight) (2) Ultrasonic Sensor (time of flight, sound) (1) Ultrasonic Sensor (time of flight, sound) (2) 4.1.

ECE 470: Homework 5. Due Tuesday, October 27 in Seth Hutchinson. Luke A. Wendt

Announcements. Hough Transform [ Patented 1962 ] Generalized Hough Transform, line fitting. Assignment 2: Due today Midterm: Thursday, May 5 in class

Chapter 32 Light: Reflection and Refraction. Copyright 2009 Pearson Education, Inc.

Announcements. Motion. Structure-from-Motion (SFM) Motion. Discrete Motion: Some Counting

Perception. Autonomous Mobile Robots. Sensors Vision Uncertainties, Line extraction from laser scans. Autonomous Systems Lab. Zürich.

BIL Computer Vision Apr 16, 2014

Augmented Reality II - Camera Calibration - Gudrun Klinker May 11, 2004

Physics for Scientists & Engineers 2

3D Modeling using multiple images Exam January 2008

Projective geometry for Computer Vision

Topics and things to know about them:

Computer Vision Projective Geometry and Calibration. Pinhole cameras

Optics II. Reflection and Mirrors

Feature descriptors and matching

Lecture 4: Image Processing

Part 3: Image Processing

Stereo Wrap + Motion. Computer Vision I. CSE252A Lecture 17

Practice Exam Sample Solutions

Outline 7/2/201011/6/

Other approaches to obtaining 3D structure

All human beings desire to know. [...] sight, more than any other senses, gives us knowledge of things and clarifies many differences among them.

Computer Vision Project-1

Introduction to Computer Vision

Transcription:

Numerical Analysis and Computer Science, KTH Danica Kragic EXAM SOLUTIONS Computer Vision Course 2D1420 Thursday, 11 th of march 2003, 8.00 13.00 Exercise 1 (5*2=10 credits) Answer at most 5 of the following questions. If you answer more than five, only first five answers will be considered and the rest ignored. (1) What is fovea and what is its function? Describe what are cones and rods. Fovea: shallow pit close to blind spot with high concentration of photoreceptors, fraction of mm in diameter, supports high-resolution vision in bright light, controls gaze. Cones (tappar) : used for photopic (light) vision, concentrated in the fovea centralis of the macula, three color types - red, green, blue, require bright light, relatively sparse outside fovea, wavelength specific. Rods (stavar): used for scotopic or dark vision, distributed over the entire retina, with a peak at about 20deg from the fovea centralis, similar sensitivity wrt color (not wavelength specific), very sensitive to light (quantum level). Each eye contains about 120 million rods and 5 million cones. (2) Explain terms radiance and irradiance. Radiance: Amount of light radiated from a surface into a given solid angle per unit area. The area is the foreshortened area, as seen from the direction that the light is being emitted. Irradiance: Light power per unit area (watts per square meter) incident on a surface. If surface tilts away from light, same amount of light strikes bigger surface - foreshortening (less irradiance). (3) Explain terms sampling and quantization. Sampling: selection of a discrete grid to represent an image; Spatial deiscretization of an image Quantization: Mapping of the brightness into a numerical value; Assigning a physical measurement to one of a discrete set of points in a range. 1

(4) What are the common image points distance measures? Give at least two examples. Common distance measures: Euclidean distance City block distance Chessboard distance d(p,q) = (x u) 2 + (y v) 2 d(p,q) = x u + y v d(p,q) = max( x u, y v ) (5) What is ment by contrast reversal in terms of grey-level transformations? Draw the corresponding linear transformation. Contrast reversal - basically inverting the grey-level values. f (x) = g(1 x) (6) How is a signal in a Fourier domain affected by a) translation and b) mirroring of the original signal in the spectral domain. Translation affects only phase while power spectrum is translation invariant. Mirroring in spectral and spatial domain are the same. (7) What is ment by ideal low-pass filter? Is this filter suitable to use in terms of image processing? If yes, give an example of its application. If no, explain why. The ideal low-pass filter is a mathematically idealized version of a smoothing filter. In a frequencydomain representation of the image, all frequencies above a threshold F are discarded, i.e. this 2

filter passes low frequencies so image becomes blurred. This method of smoothing tends to create images with ringing at sharp boundaries in the picture which is its drawback. The observation that the application of the low-pass filter is equivalent to the convolution of the image with the sinc function provides an explanation for this phenomenon. This filter cannot be implemented in hardware. Ideal low pass filter in 2D: ĥ(v) = rect( v ) where v = v 2 1 2v + v2 2 c v c cut-off frequency Impulse response h(x) = 2πv 2 J 1 (2πv c x ) c 2πv c x J 1 - first order Bessel function (8) In what cases is spectral filtering more appropriate than spatial one? Give examples. 1) If the noise is periodic, 2) If we want to filter the image using large kernels. (9) What is ment by region growing? When is it commonly used? Region growing - homogeneous regions grows in size by including similar neighboring pixels, the final result does not necessarily need to cover the entire image First, seed regions have to be extracted, and these seed regions are iteratively grown at their borders by accepting new pixels being consistent with the pixels already being contained in the region. After each iteration, the homogeneity value of the region has to be re-calculated using also the new pixels. The results of region growing heavily depend of a proper selection of seed points Usage: divide image pixels into a set of classes based on local properties and features (classification), matching in stereo, split-and-merge transform etc. (10) What is region adjacency graph? What corresponds merging in terms of a region adjacency graph? It is used to store region attributes for every region - related to the level of homogeneity of regions. (Image provided in lectures.) merge = graph contraction 3

Exercise 2 (2+2+2+1=7 credits) (a) Describe basics and draw figures of perspective and orthographic camera models. Given a set of parallel lines in 3D - explain what is the difference in their image projections for both models. Perspective projection Orthographic projection x f = X Z, x = X, y f = Y Z y = Y For the orthographic camera model, parallel lines remain parallel and for a perspective camera model the lines intersect in a vanishing point. Image Image focal point Perspective projection Orthographic projection (b) Under what conditions will a set of parallel lines viewed with a pinhole camera have its vanishing point (in the image) at infinity? In case when lines are in the plane parallel to the image plane. (c) If the area of a planar square in the scene is A, what is it s area in an image under orthographic projection? Give your answer in terms of any parameters neccessary to define this relationsship, specifying what each parameter means. A cos θ where θ is the angle between the surface normal and the optical axis. 4

(d) What is the difference between an affine and perspective camera model? Affine camera can be seen as a linear model, while perspective camera is a non-linear one. Affine preserves parallel lines, and perspective does not. Affine camera is a good model when all the points lie on a relatively planar surface and their relative depth from the camera is large. Afiine camera - 8 prameters, perspective camera 12 parameters. Exercise 3 (2+3+2+2=9 credits) (a) An image has been smoothed using following kernel: k [1 5 10 10 5 1]. Can repeated convolutions of an image with the kernel 1 [1 1] 2 be used to otain the same result as with the first kernel. If yes, how many convolutions are needed? If no, explain the reasons why. What should be the constant k so that the filter gain is equal to 1? Yes, after 4 convolutions we obtain the same filter and k = 1 32 since 1+5+10+10+5+1=32. (b) Show that the kernel k [ 1 2 0 2 1] can be written as multiple convolutions of a low pass filter 1 2 [1 1] and a high pass filter [1 1]. What should k be in order for the kernel to be a good approximation of the first order derivative? What is the frequency response of this kernel? Yes, since: [1 1] [1 1] = [1 2 1] [1 2 1] [1 1] = [1 3 3 1] [1 3 3 1] [1 1] = [1 2 0 2 1] It is obvious that k = 1 8 because 3 convolutions with a low-pass filter have been performed and the maximum value should not change. Also, to obtain the kernel [ 1 2 0 2 1] the sign has to be negative. 5

Frequency response (according to lectures): e i2ω 2e iω + 2e iω + e i2ω = 4 i sinω + 2 i sin2ω = i (4sinω + 2 sin2ω) (c) Give an example of a mean (unweighted averaging) filter. Is a mean filter in general separable? Why do we prefer separable filters? Let us take the simplest example: 1 1 1 1 1 1 1 1 1 Since unweighted averaging is assumed, all elements in the matrix are equal. It is always separable since its rank is equal to one. Separable filters are prefered since they can be implemented as repeated convolutions with one-dimensional kernels. (d) Give an image below before (left) and after (right) a smoothing filter was applied. The size of the filter is shown as a small square in the upper-left corner in the image (as you can see, its size is rather small compared to the image size). In your opinion, which one of the following filter types most likely produced the image on the right: 1) mean (unweighted averaging) filter, 2) median filter, or 3) Gaussian filter. Motivate your answer.. Median filter since the line on the left of the image dissapperas - with any other filter it would remain and be extended. 6

Exercise 4 (2+2+1=5 credits) You are given an image of an object on a background containing a strong illumination gradient. You are supposed to segment the object from the background in order to estimate its moment descriptors. (a) Sketch the histogram of the image and explain what are the problems related to choosing a suitable threshold. The histogram is not clearly bimodal - difficult to obtain local minimum. 7000 6000 5000 4000 3000 2000 1000 0 0 50 100 150 200 250 300 (b) Propose a suitable methodology that could be used to perform successful segmentation. a) Local adaptive thresholding selects an individual threshold for each pixel based on the range of intensity values in its local neighbourhood. This allows for thresholding of an image whose global intensity histogram doesn t contain distinctive peaks. b) estimating the illumination gradient and 7

substracting it from the image c) estimating gradient close to the object boundaries and using this to perform adaptive thresholding (c) You have applied histogram equalization to an image. If you apply histogram equalization to an image second time, will you improve image quality even more? Motivate your aswer. No since the first histogram equalization operation will flatten the image as much as possible. The second application will not change the histogram significantly. Exercise 5 (2+2+2=6 credits) (a) What is the epipolar constraint and how can it be used in stereo matching? Represents geometry of two cameras, reduces a correspondance problem to 1D search along an epipolar line. A point in one view generates an epipolar line in the other view. The corresponding point lies on this line. Epipolar geometry is a result of coplanarity between camera centers and a world point - all of them lie in an epipolar plane. X q q o p p o centers of projection (b) Assume a pair of parallel cameras with their optical axes perpendicular to the baseline. How do the epipolar lines look like? Where are the epipoles for this type of camera system? The epipolar lines are parallel to each other (also, in the direction of th image y axis). If the optical centers of the camera are on the same height, the corresponding epipolar lines will be on the same height - one line for both images. The epipoles are in the infinity. (c) Estimate the essential matrix between two consecutive images for a forward translating camera. What is the equation of the epipolar line for the point p=[x y 1]? In general E = t S R where t S is a skew-symmetric matrix related to translation vector. For a forward translating camera (see figure), we have 8

R = I, and t S = 0 t Z 0 t Z 0 0 0 0 0 therefore E = 0 t Z 0 t Z 0 0 0 0 0 From l = E p the epipolar line for a point p = [x y 1] is l = 0 t Z 0 t Z 0 0 0 0 0 x y 1 = t Z y t Z x 0 9

Exercise 6 (2+2+3=7 credits) (a) What is the difference between motion field and optical flow? Motion field: Projection of 3D motion onto the image plane. Optical flow: Apparent motion of the image brightness pattern (not necessarily related to the motion field). (b) Under what assumptions does the optical flow constraint work? When does it not work? Brightness constancy: does not work if the light source moves, while the object is static. Textured region: does not working without sufficient local image structure. alt. Lambertian assumption: might not work if objects have specularities (mirror-like objects). alt. Spatially local constancy: necessary in practice, since derivatives have to be determined within some region. (c) Assume that an object located at a distance 10 m is moving at a speed of 3 m/s in a direction parallel to the image plane as indicated in the figure. How fast should the camera be rotated around the x and y-axis, such that the object remains fixated in the centre of the image. Assume that the focal length is f = 5. y 30 o x The motion of the object in 3D is { Ẋ = 3cos30 ms 1 = 3 3/2 ms 1 2.6 ms 1 Ẏ = 3sin30 ms 1 = 3/2 ms 1 = 1.5 ms 1 The translational motion field is given by { ut = f Ẋ/Z = f 3 3/20 s 1 v t = f Ẏ /Z = f 3/20 s 1 and the rotational motion field is { ur = f ω y v r = f ω x Since { motion field is supposed { to be zero u = ut + u r = 0 v = v t + v r = 0 ur = u t = f ω y v r = v t = f ω x 10

the rotation of the camera has to be { ωx = v t / f = 3/20 s 1 = 0.15 s 1 8.6 s 1 ω y = u t / f = 3 3/20 s 1 0.26 s 1 14.9 s 1 Exercise 7 (3+3=6 credits) Assume that we want to classify an image into one of two classes: C A and C B. We know that the prior probability for A is two time as large as B: p(a) = 2 3 and p(b) = 1 3. After the preprocessing step we get a feature map z. Calculate an expression for a Bayesian classifier: (a) in the case of a one dimensional feature map, σ A = 4, σ B = 1, and m A = m B = 0, and: p(z C k ) = 1 e (z m k) 2 /(2σ 2 k ). 2πσk Sketch the decision functions and calculate the decision boundaries Using Bayes rule (withouth the normalization), we have: p(c A )p(z C A ) = p(c B )p(z C B ) Using the above equation for the PDFs and priors: 2 3 2π 4 e z2 /32 = 1 3 2π e z2 /2 From which we get: Decision boundaries given by: Decision function given by: z = ±1.216 z > 1.216 z < 1.216 Class A if z > 1.216 Class B if z < 1.216 11

(b) in the( case of a) two dimensional ( ) feature map, 1 0 2 0 Σ A =, Σ 0 2 B =, m 0 1 A = m B = 0, and p(z C k ) = Calculate the two dimensional decision boundary. From the above, we have: 1 2π detσ k 1/2 e (z m k) T Σ 1 k (z m k )/2. detσ A = 2, Σ 1 A = 1 ( 2 0 2 0 1 detσ B = 2, Σ 1 B = 1 ( 1 0 2 0 2 Again, using Bayes rule withouth normalization: ) ) 1 3π 2 e z T 2 0 0 1 z 4 = 1 6π 2 e z T 1 0 0 2 4 z 2 e (2x2 +y 2 )/4 = e (x2 +2y 2 )/4 ln2 (2x 2 + y 2 )/4 + (x 2 + 2y 2 )/4 = 0 Two-dimensional decision boundary reprsented by a hyperbola: x 2 y 2 = 4ln2 12