CS 231A Computer Vision Midterm

CS 231A Computer Vson Mdterm Tuesday October 30, 2012 Set 1 Multple Choce (22 ponts) Each queston s worth 2 ponts. To dscourage random guessng, 1 pont wll be deducted for a wrong answer on multple choce questons! For answers wth multple answers, 2 ponts wll only be awarded f all correct choces are selected, otherwse, t s wrong and wll ncur a 1 pont penalty. Please draw a crcle around the opton(s) to ndcate your answer. No credt wll be awarded for unclear/ambguous answers. 1. (Pck one) If all of our data ponts are n R 2, whch of the followng clusterng algorthms can handle clusters of arbtrary shape? (a) k-means. (b) k-means++. (c) EM wth a gaussan mxture model. (d) mean-shft. Only d 2. (Pck one) Suppose we are usng a Hough transform to do lne fttng, but we notce that our system s detectng two lnes where there s actually one n some example mage. Whch of the followng s most lkely to allevate ths problem? (a) Increase the sze of the bns n the Hough transform. (b) Decrease the sze of the bns n the Hough transform. (c) Sharpen the mage. (d) Make the mage larger. a 3. (Pck one) Whch of the followng processes would help avod alasng whle downsamplng an mage? 1

(a) Image sharpenng. (b) Image blurrng. (c) Medan flterng where you replace every pxel by the medan of pxels n a wndow around the pxel. (d) Hstogram equalzaton. b 4. (Crcle all that apply) A Sobel flter can be wrtten as 1 2 1 0 0 0 1 2 1 Whch of the followng statements are true = 1 0 1 [ 1 2 1 ] (1) (a) Separatng the flter n the above manner, reduces the number of computatons. (b) It s smlar to applyng a gaussan flter followed by a dervatve. (c) Separaton leads to spurous edge artfacts. (d) Separaton approxmates the frst dervatve of gaussan. a and b 5. (Pck one) Whch of the followng s true for Egenfaces (PCA) (a) Can be used to effectvely detect deformable objects. (b) Invarant to affne transforms. (c) Can be used for lossy mage compresson. (d) Is nvarant to shadows. c 6. (Pck one) Downsamplng can lead to alasng because (a) Samplng leads to addtons of low frequency nose. (b) Sampled hgh frequency components result n apparent low frequency components. (c) Samplng ncreases the frequency components n an mage. (d) Samplng leads to spurous hgh frequency nose b 7. (Pck one) If we replace one lens on a calbrated stereo rg wth a bgger one, what can we say about the essental matrx, E, and the fundamental matrx, F? 2

(a) E can change due to a possble change n the physcal length of the lens. F s unchanged. (b) F can change due to a possble change n the lens characterstcs. E s unchanged. (c) E can change due to a possble change n the lens characterstcs. F s unchanged. (d) Both are unchanged. b 8. (Pck one) Whch of the followng statements descrbes an affne camera but not a general perspectve camera? (a) Relatve szes of vsble objects n a scene can be determned wthout pror knowledge. (b) Can be used to determne the dstance from a object of a known heght. (c) Approxmates the human vsual system. (d) An nfntely long plane can be vewed as a lne from the rght angle. a 9. (Crcle all that apply) Whch of the followng could affect the ntrnsc parameters of a camera? (a) A crooked lens system. (b) Damond/Rhombus shaped pxels wth non rght angles. (c) The aperture confguraton and constructon. (d) Any offset of the mage sensor from the lens s optcal center. A,B,D 10. (Crcle all that apply) For camera calbraton, we learned that snce there are 11 unknown parameters, we need at least 6 correspondences to calbrate. Assumng that you couldn t fnd a calbraton target wth the mnmum of 6 corners to use as correspondences, you decde to take K pctures from dfferent vewponts of a statonary pattern wth N corners, where N < 6, whch of the followng statements s true? (a) The number of mages, K must satsfy 2NK > 11, for the 11 unknowns, the value of N sn t mportant, so long as t as N > 0. (b) The problem s unsolvable, snce you do not have enough correspondences n a sngle mage. (c) The number of unknown parameters scales wth the number of unque mages taken. (d) The number of unknown parameters s fxed, but the N corners must not be co-lnear. C 11. (crcle all that apply) Whch of the followng statements about correlaton are true n general? 3

(a) For a symmetrc 1D flter, computng convoluton of the flter wth a sgnal s the same as computng correlaton of the flter wth the sgnal. (b) Correlaton computaton can be made fast through the use of Dscrete Fourer Transform. (c) Correlaton computaton s not Shft Invarant. (d) The correlaton method would be effectve n solvng the correspondence problem between two mages of a checkerboard. A B 4

2 True or False (10 ponts) True or false. Correct answers are 1 pont, -1 pont for each ncorrect answer. (a) (True/False) Fsherfaces works better at dscrmnaton than Egenfaces because Egenfaces assumes that the faces are algned. (False), both assume the faces are algned (b) (True/False) If you don t normalze your data to have zero mean, then the frst prncpal component found va PCA s lkely to be unnformatve. (True) (c) (True/False) Gven suffcently many weak classfers, boostng s guaranteed to get perfect accuracy on the tranng set no matter what the tranng data looks lke. (False), one could have two dataponts whch are dentcal except for ther label (d) (True/False) Boostng always makes your algorthm generalze better. (False), you can overft (e) (True/False) It s possble to blur an mage usng a lnear flter. (True) (f) (True/False) When extractng the egenvectors of the smlarty matrx for an mage to do clusterng, the frst egenvector to use should be the one correspondng to the second largest egenvalue, not the largest. (False), t s the largest one (g) (True/False) The Canny edge detector s a lnear flter because t uses the Gaussan flter to blur the mage and then uses the lnear flter to compute the gradent. (False), It has non-lnear operatons, thresholdng, hysteress, non-maxmal supresson (h) (True/False) A zero skew ntrnsc matrx s not full rank because t has one less DOF. (False), It s stll full rank, even f t has one less DOF. () (True/False) Compared to the normalzed cut algorthm, the parttons of mnmum cut are always strctly smaller. (False), (Ths queston was a bt ambguous, so we gave everyone credt for any answer) Mn cut prefers very small and very large parttons, normalzed cur prefers parttons of roughly equal sze 5

(j) (True/False) Assumng the camera coordnate system s the same as the world coordnate system, the ntrnsc and extrnsc parameters of the a camera can map any pont n homogenous world coordnates to a unque pont n the mage plane. (False), In ths stuaton, the vector 0 0 0 1 spans the null space of the camera matrx, and represents the camera orgn, and the projectve lne n ths case s ambguous. 6

3 Long Answer (32 ponts) 11. (10 ponts) Detectng Patterns wth Flters A Gabor flter s a lnear flter that s used n mage processng to detect patterns of varous orentatons and frequences. A Gabor flter s composed of a Gaussan kernel functon that has been modulated by a snusodal plane wave. The real value verson of the flter s shown below. g (x, y; λ, θ, ψ, σ, γ) = exp Where x = x cos (θ) + y sn (θ) y = x sn (θ) + y cos (θ) ( x 2 +γ 2 y 2 2σ 2 ) cos ( 2π x λ + ψ ) Fgure 1 shows an example of a 2D Gabor Flter. 0.8 5 10 15 20 25 0.6 0.4 0.2 0 0.2 0.4 0.6 30 0.8 10 20 30 40 50 60 Fgure 1: 2D Gabor Flter (a) (6 ponts) What s the physcal meanng of each of the fve parameters of the Gabor flter, λ, θ, ψ, σ, γ, and how do they affect the mpulse response? Hnt: The mpulse response of a gaussan flter s shown n Equaton 2, t s normally radally symmetrc, how would you make ths flter ellptcal? How would you make ths flter steerable? What does the 2D cosne modulaton do to ths flter? gaussan (x, y) = 1 ( 2πσ 2 exp x2 + y 2 ) 2σ 2 (2) λ: represents the wavelength of the snusodal factor θ: represents the orentaton of the normal to the parallel strpes of a Gabor functon ψ: phase offest σ: Sgma of the gaussan envelope γ: Spatal aspect rato, and specfes the ellptcty of the support of the Gabor functon (b) (4 ponts) Gven a Gabor flter that has been tuned to maxmally respond to the strped pattern n shown n Fgure 2, how would these parameters, λ 0, θ 0, ψ 0, σ 0, γ 0, have to be modfed to recognze the followng varatons? Provde the values of the new parameters n terms of the orgnal values. 7

Fgure 2: Reference Pattern. θ = θ 0 + π 4. ψ = ψ 0 + π 8

. θ = θ 0 + π 4, σ = 2σ 0, λ = 2λ 0 v. γ = 1 2 γ 0 9

12. (10 ponts) Stereo Reconstructon Fgure 3: Rectfed Stereo Rg x x (a) (2 ponts) The fgure above shows a rectfed stereo rg wth camera centers O and O, focal length f and baselne B. x and x are the projected pont locatons on the vrtual mage planes by the pont P ; note that snce x s to the left of O, t s negatve. Gve an expresson for the depth of the pont P, shown n the dagram as Z. Also gve an expresson for the X coordnate of the pont P n world coordnates, assumng an orgn of O. You can assume that the two are pnhole cameras for the rest of ths queston. Z = fb X = xz f 10

(b) (4 ponts) Fgure 4: Rectfed Stereo Rg wth mage plane error Now assume that the camera system can t perfectly capture the projected ponts locaton on the mage planes, so there s now some uncertanty about the pont s locaton snce a real dgtal camera s mage plane s dscretzed. Assume that the orgnal x and x postons now have an uncertanty of ±e, whch s related to dscretzaton of the mage plane.. Gve an expresson of the X, Z locatons of the 4 ntersecton ponts resultng from the vrtual mage plane uncertanty.. Gve an expresson for the maxmum uncertanty n the X and Z drectons of the pont P s locaton n world coordnates. All expressons should be n terms of mage coordnates only, you can assume that x s always postve and x s always negatve. fb Z mn = (x+e) (x e) fb (x e) (x +e) Z max = d = x x Z dff = Z max Z mn = fb Z md = fb (x e) (x e) = fb x x X mn = (x e)z md f X max = (x+e)z md f X max X mn = Z md(2e) f 4e (d 2e)(d+2e) 11

(c) (4 ponts) Assume the X coordnate of the pont P s fxed.. Gve an expresson for the uncertanty n the reconstructon of Z, n terms of the actual value of Z and the other parameters of the stereo rg.. What s the depth uncertanty when Z s equal to zero?. Fnd the depth when the uncertanty s at ts maxmum and gve a physcal nterpretaton and a drawng to explan. d = fb Z Z dff (Z) = 4fBe ( fb Z ) 2 4e 2 Z dff = 0 when Z = 0, Ths s when Z s n between the two camera orgns, so the ray cast by P ntersects the mage plane at nfnty, whch means an nfnte dscrepancy. Z dff = when Z = fb 2e When Z takes on ths value, the dsparty between the two cameras equals 2e whch means that wth the addtonal e error from both cameras, the pont P wll be vewed wth a effectve dsparty of zero, so one of the reconstructed dstances wll be nfnte, whle the other possblty wll be fnte, but the dfference wll be nfnte. The pont s so far away, that the small error term causes the stereo rg to reconstruct t at nfnty. 13. (12 ponts) AdaBoost algorthm for Face Detecton Let f M (x) be the classfyng functon learnt after the M th teraton. f M (x) = M β m C m (x) (3) m=1 Where, C m (x) m {1,..., M} s a bunch of weak classfers learned n M teratons. C m : R { 1, 1}. We wll now look at a dervaton for the optmal β, C at the m th teraton gven β k, C k k {1,..., m 1}. You are also gven N trannng samples {(x, y )} =1,...,N, where x s a data pont and y { 1, 1} s the correspondng output. ( N (a) (2 ponts) (β m, C m ) = arg mn =1 L [y, f m 1 (x ) + βc(x )]), where L[y, g] = β,c exp( yg) s the loss functon. Show that (β m, C m ) can be wrtten n the form (β m, C m ) = arg mn β,c Gve an expresson for w (m 1) Note that w (m 1) β m, C m = arg mn = arg mn β,c β,c. N =1 w (m 1) exp{ βy C(x )} (4) s the weght assocated wth the th data pont after m 1 teratons. N =1 exp{ y f m 1 (x ) βy C(x )} N =1 w(m 1) exp{ βy C(x )} w (m 1) = exp{ y f m 1 (x )} 12

(b) (3 ponts) Express the optmal C m n the form arg mn Err(C), where Err(C) s an error functon and s ndependent of β. Err(C) should be defned n terms of the ndcator functon { I[C(x) y ] gven by 1, f C(x ) y I[C(x ) y ] = 0 f C(x ) = y C C m = arg mn e β y =C(x ) w(m 1) + e β y C(x ) w(m 1) C (e β e β ) N C m = arg mn C C m = arg mn C =1 w(m 1) N =1 w(m 1) I[y C(x )] I[y C(x )] + e β N =1 w(m 1) (c) (3 ponts) Usng the C m from part (a), the optmal expresson for β m can be obtaned as ( ) β m = 1 2 log 1 errm err m where err m = N =1 w(m) I[y C m(x )] N =1 w(m) Now, fnd the update equaton for w (m) From part(a), we have w (m) = w (m 1) e βmy C m(x ) Snce, y C m (x ) = 1 2I[y C m (x )] w (m+1) = w (m) e 2βmI[y C m(x )] e βm. and show that w (m) w (m 1) exp (2β m I[y C m (x )]) (d) (4 ponts) We wll use ths algorthm to classfy some smple faces. The set of tranng mages s gven n Fg. 5. x s the face and y s the correspondng face label, {1, 2, 3, 4, 5}. We are also gven 3 classfer patches p 1, p 2, p 3 n Fg. 6. A patch detector I (+) (x, p j ) s defned as follows: { I (+) (x, p j ) = I ( ) (x, p j ) = I (+) (x, p j ) 1, f mage x contans patch p j 1, otherwse All classfers C m are restrcted to belong to one of the 6 patch detectors,.e. C m (x) {I (±) (x, p 1 ), I (±) (x, p 2 ), I (±) (x, p 3 )}. If C 1 (x) = I (+) (x, p 2 ), w (0) = 1, {1, 2, 3, 4, 5} and β 1 = 1,. What s the optmal C 2 (x)?. What are the updated weghts w (1)? 13

x 1 x 2 x 3 y 1 = +1 y 2 = +1 y 3 = +1 x 4 x 5 y 4 = -1 y 5 = -1 Fgure 5: Tranng Set Faces p 1 p 2 p 3 Fgure 6: Classfer patches. What s the fnal classfer f 2 (x) combnng C 1, C 2? v. Does I [f 2 (x) > 0] correctly classfy all tranng faces? C 2 (x) = I (+) (x, p 3 ) w (1) exp(2β 2 ) for = 1, 5 w (1) 1 for = 2, 3, 4 where, β 2 = 0.5log(1.5) f 2 (x) = C 1 (x) + 0.5log(1.5) C 2 (x) Yes, t correctly classfes all the tranng mages. 14

4 Short Answer (36 ponts) 14. (6 ponts) Parallel Lnes under Perspectve Transforms Fgure 7: Boxes rendered usng dfferent projectons (a) (2 ponts) The two boxes n Fgure 7 represent the same 3D shape rendered usng two projectve technques, explan ther dfferent appearance and the types of projectons used to map the objects to the mage plane. Fgure on the left s an orthographc projecton, parallel lnes are parallel, fgure on the rght s perspectve, parallel lnes at an angle to the camera plane have a vanshng pont (b) (2 ponts) For each projecton, f the edges of the cubes were to be extended to nfnty, how many ntersecton ponts would there be? Left, none, rght, 3 vanshng ponts. (c) (1 pont) What s the maxmum number of vanshng ponts that are possble for an arbtrary mage? There s no lmt, any set of parallel lnes at an angle to the camera wll converge at a vanshng pont (d) (1 pont) How would you arrange parallel lnes so that they do not appear to have a vanshng pont? Place lnes that are parallel to the camera plane, they wll converge at the pont at nfnty 15. (6 ponts) Usng RANSAC to fnd crcles Suppose we would lke to use RANSAC to fnd crcles n R 2. Let D = {(x, y )} n =1 be our data, and let I be the random seed group of ponts used n RANSAC. 15

(a) (2 ponts) The next step of RANSAC s to ft a crcle to the ponts n I. Formulate ths as an optmzaton problem. That s, represent fttng a crcle to the ponts as a problem of the form mnmze L(x, y, c x, c y, r) where L s a functon for you to determne whch gves the dstance from (x, y ) to the crcle wth center (c x, c y ) and radus r. L(x, y, c x, c y, r) = sqrt((x c x ) 2 + (y c y ) 2 ) r (b) (2 ponts) What mght go wrong n solvng the problem you came up wth n (1) when I s too small? The problem s underdetermned. Wth e.g. 2 ponts there are nfntely many crcles one can ft perfectly. (c) (2 ponts) The next step n our RANSAC procedure s to determne what the nlers are, gven the crcle (c x, c y, r). Usng these nlers we reft the crcle and determne new nlers n an teratve fashon. Defne mathematcally what an nler s for ths problem. Menton any free varables. An nler s a pont (x, y) such that sqrt((x c x ) 2 + (y c y ) 2 ) r T for some threshold T. 16. (6 ponts) Fast forward camera I Fgure 8: Camara movement Suppose you capture two mages P and P n pure translaton n the Z drecton shown n Fgure 8. Image planes are parallel to the XY plane. (a) (3 ponts) Suppose the center of an mage s (0,0). For a pont (a, b) on mage P, what s the correspondng eppolar lne on mage P? the lne goes through (0, 0) and (a, b) on mage P. 16

(b) (3 ponts) What s the essental matrx n ths case assumng the camera s calbrated? 0-1 0 1 0 0 0 0 0 17. (6 ponts) K-Means Fgure 9: A wld jackalope (a) (3 ponts) What s lkely to happen f we run k-means to cluster pxels when we only represent pxels by ther locaton? Wth k=4, draw the boundary around each cluster and mark each cluster center wth a pont for some clusters that mght result when runnng k-means to convergence. Draw on Fgure 9, and set the ntal cluster centers to be the four corners of the mage. For the mage, we expect each quadrant to be ts own cluster. (b) (1 pont) What does ths tell us about usng pxel locatons as features? It s not suffcent, we need rcher features. (c) (2 ponts) We replace the sum of squared dstances of all ponts to the nearest cluster center crteron n k-means wth sum of absolute dstances of all ponts to the nearest cluster center,.e. our dstance s now gven by d(x 1, x 2 ) = x 1 x 2 1. How would the update step change for fndng the cluster center? At every teraton 1. Assgn ponts to the closest cluster center 17

2. Update the cluster centers wth the medan of ponts belongng to a cluster 18. (6 ponts) Canny Edge Detector (a) (4 ponts) There s an edge detected usng the Canny method. Ths detected edge s then rotated by θ as shown n Fgure 10, where the relaton between a pont on the orgnal edge (x, y) and a pont on the rotated edge (x, y ) s gven by x = x cos θ (5) y = x sn θ (6) Wll the rotated edge be detected usng the Canny method? Provde ether a mathematcal proof or a counter example. Fgure 10: Edge Rotated by θ Our rotaton s gven by x = x cos θ y = x sn θ Our canny edge depends on the magntude of the dervatve whch s the only part of the algorthm whch could have really changed. Ths s gven by Dx 2 x + D 2 y y = cos 2 θd 2 xx + sn 2 θdxx 2 = D 2 xx whch s the same rule for the orgnal edge thus we have shown that the Canny method s rotatonally nvarant. (b) (2 ponts) After runnng the Canny edge detector on an mage, you notce that long edges are broken nto short segments separated by gaps. In addton, some spurous edges appear. For each of the two thresholds (low and hgh) used n hysteress thresholdng, state how you would adjust the threshold (up or down) to address both 18

problems. Assume that a settng exsts for the two thresholds that produces the desred result. Explan your answer very brefly. The gaps n the long edges requre a lower low threshold: parts of the long edge are detected, so the hgh threshold s low enough for these edges, but the edges are dsconnected because the low threshold s too hgh. Lowerng the low threshold wll nclude more pxels of the long edges. Elmnatng the spurous edges requres a hgher hgh threshold. The hgh threshold should be ncreased only slghtly, so as not to make the long edges dsappear. The assumpton n the problem statement ensures that ths s possble. 19. (6 ponts) Cascaded Hough transform for detectng vanshng ponts 3 2.5 2 Y AXIS 1.5 1 0.5 0 0 0.5 1 1.5 2 2.5 3 3.5 4 4.5 5 X AXIS Fgure 11: Hough Transform For ths problem we are gong to use the slope m ntercept c representaton of lne y = mx + c. The attached Fgure 11 shows the vertces of a rectangular patch under perspectve transformaton. We wsh to fnd the vanshng pont n the mage through Hough Transform. (a) (2 ponts) Plot the Hough transform representaton of the mage. Assume no bnnng and make plots n a contnuous (m, c) space. Just show the ponts wth two or more votes. See fgure 12 (b) (2 ponts) Now usng y = mx + c representaton, run Hough transform on the results from part (a) (after usng a threshold of 2 votes) to get a representaton n the (x, y) space agan. 19

8 7 6 5 Intercept (c) 4 3 2 1 0 4 3 2 1 0 1 2 3 4 slope (m) Fgure 12: Make your plot here The same plot as the plot n the problem statement, wth an ntersecton at (2.5, 1.75) and dagonals of the trapezod (do not deduct ponts f dagonals are not shown) (c) (2 ponts)fnd the vanshng pont from the representaton n part (b) The vanshng pont s (2.5, 2.5) 20