where ~n = ( ) t is the normal o the plane (ie, Q ~ t ~n =,8 Q ~ ), =( ~ X Y Z ) t and ~t = (t X t Y t Z ) t are the camera rotation and translation,

Size: px

Start display at page:

Download "where ~n = ( ) t is the normal o the plane (ie, Q ~ t ~n =,8 Q ~ ), =( ~ X Y Z ) t and ~t = (t X t Y t Z ) t are the camera rotation and translation,"

Katrina Chapman
6 years ago
Views:

1 Multi-Frame Alignment o Planes Lihi Zelnik-Manor Michal Irani Dept o Computer Science and Applied Math The Weizmann Institute o Science Rehovot, Israel Abstract Traditional plane alignment techniques are typically perormed between pairs o rames In this paper we present a method or extending existing two-rame planar-motion estimation techniques into a simultaneous multi-rame estimation, by exploiting multi-rame geometric constraints o planar suraces The paper has three main contributions: (i) we show that when the camera calibration does not change, the collection o all parametric image motions o a planar surace in the scene across multiple rames is embedded in a low dimensional linear subspace (ii) we show that the relative image motion o multiple planar suraces across multiple rames is embedded in a yet lower dimensional linear subspace, even with varying camera calibration and (iii) we show how these multi-rame constraints can be incorporated into simultaneous multi-rame estimation o planar motion, without explicitly recovering any D inormation, or camera calibration The resulting multi-rame estimation process is more constrained than the individual two-rame estimations, leading to more accurate alignment, even when applied to small image regions Introduction Plane stabilization (\D parametric alignment") is essential or many video-related applications: it is used or video stabilization and visualization, or D analysis (eg, using the Plane+Parallax approach []), or moving object detection, mosaicing, etc Many techniques have been proposed or estimating the D parametric motion o a planar surace between two rames Some examples are [,,,,] While these techniques are very robust and perorm well when the planar surace captures a large image region, they tend to be highly inaccurate when applied to small image regions Moreover, errors can accumulate over a sequence o rames when the motion estimation is perormed between successive pairs o rames (as is oten done in mosaic construction) An elegant approach was presented in [9] or automatically estimating an optimal (usually virtual) reerence rame or a sequence o images with the corresponding motion parameters that relate each rame to the virtual reerence rame This overcomes the problem associated with error accumulation in sequential rame alignment However, the alignment method used or estimating the motion between the virtual reerence rame and all other rames remains a two-rame alignment method Sequential two-rame parametric alignment methods do not exploit the act that all rames imaging the same planar surace share the same plane geometry In this paper we present a method or extending traditional two-rame planar-motion estimation techniques into a simultaneous multi-rame estimation method, by exploiting multi-rame linear subspace constraints o planar motions (Section ) The use o linear subspace constraints, or motions analysis, has been introduced by Tomasi and Kanade [] They used these constraints or actoring D correspondences into D motion and shape inormation In contrast, here we use linear subspace constraints or constraining our D planar motion estimation and not or actoringout any D inormation This results in a multi-rame estimation technique which is more constrained than the individual two-rame estimation processes, leading to more accurate alignment, even when applied to small image regions Furthermore, multi-rame rigidity constraints relating multiple planar suraces are applied to urther enhance parametric motion estimation in scenes with multiple planar suraces (Section ) Basic Model and Notations The instantaneous image motion o a D planar surace, between two image rames can be expressed as a D quadratic transormation [8,,,]: ~u(~x ~p) =X(~x) ~p () where X(~x) is a matrix which depends only on the pixel coordinates (~x) =(x y): X(~x) = h x y x i xy x y xy y : and ~p =(p p ::: p 8 ) t is a parameter vector: p = (tx +Y ) p = ( + t X ) ; tz ; p = ; ( Z ; tx ) p = (ty ; X ) p = ( Z + ty ) p = ( + t Y ) ; tz ; p = ( Y ; tz ) p8 = ; ( X + tz ) ()

2 where ~n = ( ) t is the normal o the plane (ie, Q ~ t ~n =,8 Q ~ ), =( ~ X Y Z ) t and ~t = (t X t Y t Z ) t are the camera rotation and translation, respectively, and and are the camera ocal lengths used or obtaining the two images The instantaneous motion model is valid when the camera rotation is small and the orward translation is small relative to the depth Two-Frame Parametric Alignment In this paper, we extend the direct two-rame motion estimation approach o [, ] to multiple rames To make the paper sel contained, we briey outline the basic two-rame technique below Two image rames (whose parametric image motion is being estimated) are reerred to by the names \reerence" image J and \inspection" image K A Gaussian pyramid is constructed or J and K, and the motion parameters rom J to K are estimated in a coarseto-ne manner Within each pyramid level the sum o squared linearized dierences (ie, the linearized brightness constancy measure) is used as a match measure This measure (Err) is minimized with respect to the unknown D motion parameters ~p o Eq (): (K(~x) ; J(~x)) + rj(~x) t ~u(~x ~p) Err(~p) = P (~x) where (~u(~x ~p)) is dened in Eq (), J(~x) and K(~x) denote the brightness value o image J and K at pixel ~x, respectively, and rj(~x) denotes the spatial gradient o J at ~x: @y (~x) The sum is computed over all the points within a region o interest (oten the entire image) Deriving Err with respect to the unknown parameters ~p and setting to zero, yields eight linear equations in the eight unknowns: C~p = ~ b () where C is an 8 8 matrix: C = P h i (~x) X(~x) t rj(~x) rj(~x) t X(~x) and ~ b is an 8 vector: P h i ~ b = X(~x) t rj(~x) (J(~x) ; K(~x)) : (~x) This leads to the linear solution ~p = C ; ~ b Note that C and ~ b are constructed o measurable image quantities, hence the word direct estimation This process does not require recovery o any D inormation Let ~p i denote the estimate o the quadratic parameters at iteration i p i+ ~ = ~p i + p ~ can be solved or by estimating the innitesimal increment p ~ Ater iterating certain number o times within a pyramid level, the process continues at the next ner level, etc Figs, i show an example o applying this parametric alignment method This is an airborne sequence taken rom a large distance, hence the camera induced motion can be described by a single D parametric transormation o Eq () Fig c was the reerence image and Fig b was the inspection image Fig e shows the amount o misalignment between the two input images When the method is applied to the entire image region, it yields accurate alignment (at subpixel accuracy), as can be seen in Figs and i However, once the same method is applied to a small image region (such as the rectangular region marked in Figs g, j), its accuracy degrades signicantly The arther a pixel is rom the region o analysis, the more misaligned it is In the next section we show how employing multiple rames (as opposed to two) can be used to increase accuracy o image alignment even when applied to small image regions Multi-Frame Parametric Alignment In this section we present a method or extending the two-rame technique reviewed in section into a multi-rame technique, which exploits multi-rame constraints on the image motion o a planar surace In section we derive such a multi-rame constraint, and in section we showhow it can be incorporated into the D parametric estimation o planar motion, without requiring any recovery o D inormation, or camera calibration Single Plane Geometric Constraint Let J be a reerence rame, and let K ::: K F be a sequence o F inspection rames imaging the same planar surace with the same ocal length Let ~p ::: ~p F be the corresponding quadratic parameter vectors o the planar motion (see Eq ()) The instantaneous motion model o Eq () is a good approximation o the motion over short video segments, as the camera does not gain large motions in short periods o time In some cases, such as airborne video, this approximation is good also or very long sequences Choosing the reerence rame as the middle rame extends the applicability othemodeltotwice as many rames We arrange ~p ::: ~p F in an 8 F matrix P where each column corresponds to one rame From Eq (): P = ~ ~p ~p F 8F = S t ::: ~t F 8 ~ ~ F where S = ; ; ; ; ; ; ; F () and ~t j ~ j, are the camera translation and rotation, between the reerence rame J and rame K j (j = ()

(a) (b) (c) (d) (e) () (g) (h) (i) (j) (k) Figure : (a,b,c,d) Sample rames rom a -rame airborne video clip Apart rom camera motion, there is also a small moving car Fig c was the reerence

the entire image region shows absolute dierences ater alignment, while i shows the average o the two aligned images Only the independently moving car is misaligned (g,j) Poor alignment

detected in image pixels which are distant rom the analysis region (h,k) High quality alignment rom applying the constrained multi-rame alignment to the same small rectangular region It

the same plane normal ~n =( ) t and ocal length The dimensionality o the matrices on the right hand side o Eq () implies that, without noise, the parameter matrix P is o rank at most This

motion over the sequence (eg, in case o uniorm motion it will be ) Incorporating Sub-Space Constraint into Multi-Frame Estimation In this section we show how the low-rank constraint on P

columns ontoalower dimensional subspace, because then the individual ~p j 's will already be very erroneous Instead, we would like to use the low dimensionality constraint toconstrain the

inspection rames K ::: K F share the same reerence rame J, Eq () can be extended to multiple rames as: h i C 88 ~p ~p F = ~ 8F b ~ b F () 8F or, in short: CP = B Eq () implies that

iteration i o the algorithm, rst compute B i =[ ~ b i ~ b F i ], and then project its columns onto a lower-dimensional linear subspace by seeking a matrix ^Bi o rank r (r ), which is

ollows: Note that the matrix C in Eq () is the posterior inverse covariance matrix o the parameter vector ~p Thereore, applying the constraint tob is equivalent to applying it to the

3 (a) (b) (c) (d) (e) () (g) (h) (i) (j) (k) Figure : (a,b,c,d) Sample rames rom a -rame airborne video clip Apart rom camera motion, there is also a small moving car Fig c was the reerence rame (e) The absolute dierences between two input rames (b and c) indicate initial misalignment (,i) High quality alignment (at sub-pixel accuracy) rom applying the two-rame technique to the entire image region shows absolute dierences ater alignment, while i shows the average o the two aligned images Only the independently moving car is misaligned (g,j) Poor alignment rom applying the two-rame alignment to a small image region, marked by a rectangle Although the rectangular region is well aligned (apart rom the moving car), large misalignments can be detected in image pixels which are distant rom the analysis region (h,k) High quality alignment rom applying the constrained multi-rame alignment to the same small rectangular region It was applied simultaneously to all rames Even pixels distant rom the analysis window appear well aligned ::F ) Note that the shape matrix S is common to all rames, because they all share the same plane normal ~n =( ) t and ocal length The dimensionality o the matrices on the right hand side o Eq () implies that, without noise, the parameter matrix P is o rank at most This implies that the collection o all the p j 's (j =::F ) resides in a low dimensional linear subspace The actual rank o P may be even lower than, depending on the complexity o the camera motion over the sequence (eg, in case o uniorm motion it will be ) Incorporating Sub-Space Constraint into Multi-Frame Estimation In this section we show how the low-rank constraint on P can be incorporated into the estimation o ~p ::: ~p F, without explicitly solving or any D inormation, nor or camera calibration It is not advisable to rst solve or P and then project its columns ontoalower dimensional subspace, because then the individual ~p j 's will already be very erroneous Instead, we would like to use the low dimensionality constraint toconstrain the estimation o the individual ~p j 's a-priori We next show how we can apply this constraint directly to measurable image quantities prior to solving or the individual ~p j 's Since all inspection rames K ::: K F share the same reerence rame J, Eq () can be extended to multiple rames as: h i C 88 ~p ~p F = ~ 8F b ~ b F () 8F or, in short: CP = B Eq () implies that rank(b) rank(p ) B contains only measurable image quantities Thereore, instead o applying the low-rank constraint top,we apply it directly to B, and only then solve orp Namely: at each iteration i o the algorithm, rst compute B i =[ ~ b i ~ b F i ], and then project its columns onto a lower-dimensional linear subspace by seeking a matrix ^Bi o rank r (r ), which is closest to B i (in the Frobenius norm) Then solve or P i = C i ; ^B i, which yields the desired ~p j 's The advantage o applying the constraint tob instead o P can also be explained as ollows: Note that the matrix C in Eq () is the posterior inverse covariance matrix o the parameter vector ~p Thereore, applying the constraint tob is equivalent to applying it to the matrix P, but ater normalizing its columns by the inverse covariance matrix C (Note that all ~p j 's share the same C) To equalize the eect o subspace projection on all matrix entries, and to urther condition the numerical

process, we use the coordinate normalization technique suggested by [] (Also used in the two-rame method) Fig shows a comparison o applying the two-rame and multi-rame alignment techniques Figs g and

4 process, we use the coordinate normalization technique suggested by [] (Also used in the two-rame method) Fig shows a comparison o applying the two-rame and multi-rame alignment techniques Figs g and j show the result o applying the two-rame alignment technique (See section ) to a small image region The region o interest (marked rectangle) is indeed aligned, but the rest o the image is completely distorted In contrast, the multi-rame constrained alignment (applied to rames), successully aligned the entire image eventhough applied only to the same small region This can be seen in Figs h and k Fig shows a quantitative comparison o the tworame and multi-rame alignment techniques When applying two-rame motion estimation to the small region, the arther the pixel is rom the center o the region the larger the error is However, when applying multi-rame motion estimation to the same small region, the errors everywhere are at sub-pixel level Fig shows another comparison o applying tworame alignment and multi-rame alignment to small image regions The sequence contains rames taken by a moving camera Because the camera is imaging the scene rom a short distance, and because its motion also contains a translation, thereore dierent planar suraces (eg, the house, the road-sign, etc) induce dierent D parametric motions As long as the house was not occluded, the two-rame alignment, when applied only to the house region, stabilized the house reasonably well However, once the house was partially occluded by the road-sign, and was not ully in the camera's eld o view, the quality o the two-rame alignment degraded drastically (see Figs and j) The multi-rame constrained alignment, on the other hand, successully aligned the house, even in rames where only a small portion o the house was visible (see Figs g and k) In this case, the actual rank used was much smaller than (it was ), and was detected rom studying the rate o decay o the eigenvalues o the matrix B Extending to Multiple Planes In this section we show that in scenes containing multiple planar suraces, even stronger geometric constraints can be derived and used to improve the parametric motion estimation We present two dierent multi rame geometric constraints or sequences with multiple planes The second constraint does not require constant camera calibration The Multi-Plane Rank- Constraint Let ::: m be m planar suraces with normals ~n =[ ] t,, ~n m =[ m m m ] t, respectively Let P ::: P m be the corresponding quadratic motion parameter matrices, and let S ::: S m be the Figure : A quantitative comparison o two-rame and multi-rame alignment The values in the graph correspond to misalignments in Figs g and h These errors are displayed as a unction o the distance rom the center o the rectangular region in Fig The results o two-rame alignment applied to the entire imageregion (Fig ) were usedasground truth corresponding shape matrices, as dened by Eqs () and () We can stack thep ( =:::m) matrices to orm an 8m F matrix P, where each column corresponds to one rame Since all planar suraces share the same D camera motion between a pair o rames, we get rom Eq (): P = P P m = S S m 8m ~ t ::: ~t F ~ ~ F F () The dimensionality o the matrices on the right hand side o Eq () implies that, without noise, the parameter matrix P is also o rank at most (As beore, the actual rank o P may beeven lower than, depending on the complexity and variability o the camera motion over the sequence) Incorporating Multi-Plane Rank- Constraint into Estimation Again we would like to apply the constraint to measurable image quantities We show next how this can be done Let C ::: C m be the matrices corresponding to planes ::: m Note that the matrices C 's are dierent rom each other due to the dierence in the region o summation, which is the region o each planar surace in the reerence rame We can write: C C Cm P Pm = B Bm (8) or, in short: CP= B Note that here, as opposed to Eq (), P B and C contain inormation rom all planar suraces Eq (8) implies that rank(b) rank(p)

(a) (b) (c) (d) (e) () (g) (h) (i) (j) (k) (l) (m) Figure : (a,b,c,d,e) sample images rom a sequence orames

rame b and rame d The rame was completely distorted because the house region was signicantly occluded by

rom applying the constrained multi-rame alignment The house is now well aligned even though only a small

road sign is not aligned because is at a dierent depth, and displays accurate D parallax (h,l) Badly

Although the sign appears aligned, the rest o the image is distorted, and the house displays wrong parallax

to the sign (see text) The house displays now accurate D parallax We thus project the columns o matrix B

Frobenius norm), and then solve orp = C ; B In other words, we solve or all parametric motions o all planar

case, because the matrix B is o a larger dimension (8m F ) Relative Motion Rank- Constraint Moreover, by

true even or the case o varying camera ocal length Let and j be the ocal length o the rame J and rame K j,

:::m) It is easy to see rom Eq () that taking the dierence ~p j eliminates all eects o camera rotation,

t j X j t j Y T = tj Z ]t,and: S ~ = ( ; r) ( ; r) ;( ; r) ( ; r) ( ; r) ( ; r) ( ; r) ;( ; r) ; ( ; r) ; (

(between the reerence rame J and rame K j ) We can thereore extend Eq (9) to multiple planes and multiple

5 (a) (b) (c) (d) (e) () (g) (h) (i) (j) (k) (l) (m) Figure : (a,b,c,d,e) sample images rom a sequence orames Image b was used as the reerence rame () Bad two-rame alignment o the house region between the reerence rame b and rame d The rame was completely distorted because the house region was signicantly occluded by the road-sign (j) shows same result overlayed on top o the reerence image b (g) The corresponding result rom applying the constrained multi-rame alignment The house is now well aligned even though only a small portion o the house is visible (see overlay image (k)), while the rest o the image is not distorted The road sign is not aligned because is at a dierent depth, and displays accurate D parallax (h,l) Badly distorted tworame alignment applied totheroad-sign between rames b and a (where the sign is barely visible) Although the sign appears aligned, the rest o the image is distorted, and the house displays wrong parallax (i,m) The corresponding result (to (h,l)) rom applying the constrained multi-plane (multi-rame) alignment to the sign (see text) The house displays now accurate D parallax We thus project the columns o matrix B onto a lowerdimensional subspace, at each iteration, resulting in ^B (which is closest to B in the Frobenius norm), and then solve orp = C ; B In other words, we solve or all parametric motions o all planar suraces, across all rames, simultaneously The low-rank constraint is stronger here than in the single plane case, because the matrix B is o a larger dimension (8m F ) Relative Motion Rank- Constraint Moreover, by looking at the relative motion o planar suraces we can get an even stronger geometric constraint, which is true even or the case o varying camera ocal length Let and j be the ocal length o the rame J and rame K j, respectively Let r be an arbitrary planar surace ( r could be one o ::: m ) Denote ~p j = ~p j ; ~p j r ( = :::m) It is easy to see rom Eq () that taking the dierence ~p j eliminates all eects o camera rotation, leaving only eects o camera translation and the ocal length: [~p j ] 8 =[~ S ] 8 [~ j ] (9) where ~ j =[ j t j X j t j Y T = tj Z ]t,and: S ~ = ( ; r) ( ; r) ;( ; r) ( ; r) ( ; r) ( ; r) ( ; r) ;( ; r) ; ( ; r) ; ( ; r) S ~ is common to all rames The camera translation ~t j and ocal length j are common to all planes (between the reerence rame J and rame K j ) We can thereore extend Eq (9) to multiple planes and multiple rames as ollows: P S~ P = = [~ :::~ F ] F P m S m ~ 8m () The dimensionality o the matrices on the right hand side o Eq () implies that, without noise, the dier-

6 ence parameter matrix P is o rank at most It is possible to obtain a similar constraint (with rank ), or general homographies case [] (as opposed to the instantaneous case) The rank- constraint is an extension to the constraint shown by [] Shashua and Avidan presented a rank- constraint on the collection o homographies o multiple planes between a pair o rames In our case [] the constraints are on multiple planes across multiple rames We reer the reader to [] or more details In the next section () we show how the multiplane rank- constraint can be incorporated into the multi-rame estimation process to urther enhance planar-motion estimation Incorporating the Rank- Constraint into Multi-Frame Estimation Assume that or one planar surace, r,weknowthe collection o all its parametric motions, P r (This is either given to us, or estimated at previous iteration) We would like to use the (rank ) constraint to rene the estimation o the collection o parametric motions P ::: P m, o all other planes Using Eq (8) we derive: CP = B ;C P r B m ;C m P r = B () Thereore rank(b ) rank(p) To Incorporate the constraint into the estimation o the individual P 's, we project the columns o the matrix B onto alower-dimensional ( ) subspace at each iteration, resulting in ^B (which is closest to B in Frobenius norm) Thereore, (rom Eq ()) we can estimate a new matrix B B = CP= C C m P r + ^B () and then solve or P = C ; B Note that here B and B are constructed rom measurable image quantities (B and C), as well as rom the parameters P r, which are either known or else estimated at previous iteration The process is repeated at each iteration Note: (i) I we know that the ocal length does not change along the sequence, we can also apply the constraint rank(b ) rank(p) prior to solving or P (ii) r can alternate between the planes, but we ound it to work best when r was chosen to be a \dominant" plane (ie, one whose matrix C r is best conditioned) Fig also presents a comparison o regular (unconstrained) two-rame alignment with the multi-plane constrained alignment, applied to the road-sign The motion parameters o the house region were rst estimated using the single-plane multi-rame constrained alignment (see Section ) These were then used as inputs or constraining the estimation o the sequence o D motion parameters o the road-sign The two-rame alignment technique did not perorm well in cases when the sign was only partially visible (see Figs h and l) The multi-plane (multi-rame) constrained alignment, on the other hand, stabilized the sign well even in cases when the sign was only partially visible (see Figs i and m) [] G Adiv Determining three-dimensional motion and structure rom optical ow generated by several moving objects PAMI, ():8{, July 98 [] S Ayer and H Sawhney Layered representation o motion video using robust maximum-likelihood estimation o mixture models and mdl encoding In ICCV, pages {8, 99 [] J Bergen, P Anandan, K Hanna, and R Hingorani Hierarchical model-based motion estimation In ECCV, pages {, 99 [] R I Hartley In deence o the 8-point algorithm PAMI, 9():8{9, June 99 [] M Irani, B Rousso, and P peleg Recovery o egomotion using region alignment PAMI, 9():8{, March 99 [] M Irani, B Rousso, and S Peleg Computing occluding and transparent motions IJCV, :{, February 99 [] S Ju, M J Black, and A D Jepson Multi-layer, locally ane optical ow and regularization with transparency In CVPR, pages {, 99 [8] H Longuet-Higgins Visual ambiguity o a moving plane Proceedings o The Royal Society o London B, :{, 98 [9] H Sawhney and R Kumar True multi-image alignment and its application to mosaic[k]ing and lens distortion correction In CVPR, pages {, 99 [] A Shashua and S Avidan The rank constraint in multiple () view geometry In ECCV, 99 [] C Tomasi and T Kanade Shape and motion rom image streams under orthography: A actorization method IJCV, 9:{, November 99 [] J Wang and E Adelson Layered representation or motion analysis In CVPR, pages {, 99 [] L Zelnik-Manor and M Irani Multi-rame subspace constraints on homographies MSc thesis, The Weizmann Institute o Science, Israel, March 999

Multi-Frame Alignment of Planes

Multi-Frame Alignment of Planes Lihi Zelnik-Manor Michal Irani Dept. of Computer Science and Applied Math The Weizmann Institute of Science 76100 Rehovot, Israel Abstract Traditional plane alignment techniques