Homography-Based 3D Scene Analysis of Video Sequences *

Homography-Based 3D Scene Analyss o Vdeo Sequences * Me Han Takeo Kanade mehan@cs.cmu.edu tk@cs.cmu.edu Robotcs Insttute Cargene Mellon Unversty Pttsburgh, PA 523 Abstract We propose a rameork to recover projectve depth based on mage homography and dscuss ts applcaton to scene analyss o vdeo sequences. We descrbe a robust homography algorthm hch ncorporates contrast/brghtness adjustment and robust estmaton nto mage regstraton. We present a camera moton solver to obtan the egomoton and the real/vrtual plane poston rom homography. We then apply the Levenburg- Marquardt method to generate a dense projectve depth map. We also dscuss temporal ntegraton over vdeo sequences. Fnally e present the results o applyng the homography-based vdeo analyss to moton detecton. Introducton Temporal normaton redundancy o vdeo sequences allos us to use ecent, ncremental methods hch perorm temporal ntegraton o normaton or gradual renement. Approaches handlng 3D scene analyss o vdeo sequences th camera moton can be classed nto to categores: algorthms hch use 2D transormaton or model ttng, and algorthms hch use 3D geometry analyss. Vdeo sequences o our nterest are taken rom a movng arborne platorm here the ego-moton s complex and the scene s relatvely dstant but not necessarly lat; * Ths ork as unded by DARPA under Contract Number DAAB07-97-C-J03 thereore, an ntegraton o 2D and 3D algorthms s more approprate. The layered approach [Baker et al., 998] has advantages n dealng th ths knd o scenaro, but layer segmentaton remans a problem. Approaches o structure rom moton are mostly eature-based and cannot provde dense depth maps. The lo-based method [Xong and Shaer, 995] recovers dense shape va the Kalman Flter, but mage correspondences are requred. Combnng 3D geometry nto 2D constrants s dely used n moton detecton and segmentaton [Iran and Anandan, 997, Shashua and Werman, 995]. The plane plus parallax method contrbutes a great deal to ego-moton computaton [Iran et al., 994], parallax geometry analyss [Iran and Anandan, 996] and applcatons to vdeo ndexng [Iran and Anandan, 998]. Our rameork rst calculates mage homography beteen consecutve mages snce the camera-toscene dstance s relatvely large and thereore the rst-order approxmaton o the scene can be planar. Secton 2 descrbes three components to acheve robust homography: contrast/brghtness adjustment, progressve complexty o transormaton and robust estmaton. Based on the homography, a camera moton solver s presented n Secton 3 to compute ego-moton and plane equaton, then optmzaton can be perormed to recover the dense projectve depth map o the envronment. Temporal ntegraton s perormed over vdeo sequences to rene the projectve depth. The results o applyng the homographybased vdeo analyss to moton detecton are dscussed n Secton 4.

2 Robust Homography Vdeo sequences rom a camera th ego-motons, especally the sequences taken rom a movng arborne platorm, usually nclude lghtng and envronmental changes. Contrast and brghtness adjustment s very crtcal n mage regstraton. Homography beteen mages s based on the assumpton that ether the scene s planar or the camera s only undergong rotaton and/or zooms; hoever, many vdeo sequences are taken th no restrcton o camera moton and thout domnant planes. Thereore, t s necessary to use statstcal technques to obtan robust homography. We ncorporate contrast/brghtness adjustment and robust estmaton nto mage regstraton to generate domnant homographes or complex envronments. 2. Homography th Image Intensty Adjustment Homography denes the relatonshp beteen to mages by an eght-parameter perspectve transormaton: u p0 p p 2 u = = v Px p3 p4 p5 v p 6 p7 p8 here x = ( u, v,) and x = ( u, v,) are homogeneous coordnates, and ndcates equalty up to scale. Szelsk and Shum [997] gave a smple soluton or the transormaton on hch e desgn our regstraton algorthm. Due to the derence o veponts and change o lghtng, vdeo sequences may have derent ntensty levels rom rame to rame. We model the change beteen mages as a lnear transormaton [Lucas and Kanade, 98]: I x ) = α I ( ) + β 0 ( x here α stands or contrast change, β or brghtness change, x and or correspondng pxels n to mages. Combnng ths th the general homography computaton, e obtan ~ 2 E( D; α, β ) = I ( x ) α I ( β [ ] 0 ) ~ here I s the arped mage o I by P, D s the ncremental update or P : ( Ι + D) P P, and ( I + D) P x s calculated by updatng the transormaton. Through ths representaton, e can mnmze the error metrc usng a symmetrc postve dente (SPD) solver such as Cholesky decomposton hch s tme ecent. 2.2 Progressve Transormaton Complexty Homography s computed herarchcally here estmates rom coarser levels o the pyramd are used to ntalze the regstraton at ner levels [Anandan, 989, Bergen et al., 992]. To decrease the lkelhood o the mnmzaton process convergng nto local mnma, and to mprove regstraton speed, e use derent transormatons th progressve complexty; translaton (2 parameters) at the coarsest level, then scaled rotaton plus translaton (4 parameters), ane (6 parameters), and perspectve (8 parameters). The progressve method mproves the robustness and stablty o homography computaton. 2.3 Robust Estmaton To deal th scenes thout domnant planes and/or th a certan percentage o textureless areas, robust estmaton s used to compute homography. The random sample consensus paradgm (RANSAC) [Fschler and Bolles, 98] s an early example o robust estmaton. Geometrc statstcs ere also explored n moton problems [Torr and Murray, 997, Kanatan, 997]. We apply the RANSAC scheme to homography computaton by randomly choosng a small subset o the mages to obtan an ntal homography soluton,.e., the subset denes a real/vrtual plane, and then dentyng the outlers, hch are the ponts not lyng on the plane. The process s repeated enough tmes on derent subsets and the best soluton s the homography hch maxmzes the number o ponts lyng on the plane. Ponts hch are not dented as outlers are combned to obtan a nal homography. The three components (mage ntensty adjustment, progressve transormaton complexty, and robust estmaton) are used n combnaton to acheve robust homography. Fgure (a) and (b) gve to aeral mages taken under derent lghtng condtons. Robust estmaton randomly chooses 20 subsets, each o hch s equal to 5 percent o the hole mage. Each subset generates

a homography. The best homography has the largest support area n the mage; ths area s used to compute the nal homography. In ths example, the support area or the nal homography conssts o the tops o several short buldngs rather than the real ground because the ground s not actually lat. Whte dots n Fgure (c) sho the outlers o the nal homography. It can be seen that they correspond to tops o tall buldngs (closer than the domnant plane) and part o the ground (arther than the plane). (a) rst mage (c) outlers o robust estmaton (b) second mage (d) projectve depth (darker denotes arther) Fgure : Robust homography and projectve depth 3 Recovery o Projectve Depth 3. Projectve Depth and Homography Let x = ( u, v,) and x = ( u, v,) denote homogeneous coordnates o correspondng pxels n to mages; the correspondng scene pont can be represented by homogeneous coordnate ( u, v,, ) n the 3D coordnate system o the rst mage and p = ( u /, v /, / ), here s the projectve depth o pont p [Szelsk, 996]. p denotes the same scene pont th respect to the second mage coordnate system, p = R p + T here R represents the rotaton beteen the to mage coordnate systems and T represents the 3D translaton beteen the to ves expressed n the second mage coordnate system. By usng 0 0 0 0 V = 0 0 and V = 0 0 0 0 0 0 to represent the projectons o to mages, e obtan V p = V R p + V T V RV x + V T Each 3D planar surace can be represented by a 3- vector ( a, b, c), hch s the scaled normal drecton hose sze denotes the nverse o the dstance to the plane rom the orgn. I p s on the plane, that s, ( a, b, c) p = e can get ( a, b, c) V x = So V ( R + T ( a, b, c)) V x P x here P s the homography e obtan rom the to mages. 3.2 Camera Moton Solver Robust mage regstraton gves an accurate estmaton o domnant homography beteen to mages. The support regon (non-outlers o RANSAC) corresponds to a real or vrtual planar surace n the scene. Gven ocal lengths (reer to Secton 4 or recovery o unknon ocal lengths rom vdeo sequences), the camera moton and plane equaton can be solved drectly by the ollong equaton: P V ( R + T ( a, b, c)) V R can be expressed by Euler angles hch have 3 varables, T and plane dstance are up to scale; thereore, they have 5 varables. Snce the Euler representaton o R s non-lnear, the Levenberg- Marquardt method s used to solve the above equaton. As the number o varables (8 parameters) s small, the optmzaton process s rapd. 3.3 Projectve Depth Solver The camera moton solver provdes the rotaton and translaton o to mage coordnate systems,

that s, e have M x + t here M = V R V and t = V T are knon. The Levenberg-Marquardt method s used here to mnmze: ~ 2 E( ) = I ( x ) α I ( Mx + t β [ ] 0 ) Assumng that the projectve depths o derent pxels are ndependent, e get the dagonal Hessan matrx hch makes the optmzaton process more ecent. The herarchcal rameork used n homography computaton s also appled here. To decrease the possblty o convergng to local mnma and to mprove the ecency, e use patch-based depth recovery and local search. The mage s dvded nto small patches. Each patch shares the same depth hle the patch Jacoban s the sum o the Jacoban o each pxel n the patch. When patch dsplacement exceeds a certan scale, even the multlevel depth recovery als. To overcome ths problem, local search s perormed at each patch or subpxel dsplacement. Ths dsplacement s used to solve drectly and the soluton s ncorporated nto the optmzaton as an ntal value. Fgure (d) gves the result o projectve depth recovery rom only the to mages n Fgure (a) and (b). The patch sze s 2 2 pxels and local search area s 7 7 pxels. 4 Temporal Integraton n Vdeo Sequences A vdeo sequence stores a large amount o redundant normaton o scenes as the temporal consstency. We use temporal ntegraton over vdeo sequences to rene the projectve depth and apply t to moton detecton. 4. Depth Integraton From each par o mages, e recover the projectve depth represented n the rst mage coordnate system. It s necessary to propagate ths depth representaton to the second coordnate system so that temporal ntegraton can be perormed on the recovered depth. From V RV x + V T e get x V R V + V ( R T ) Representng x and th scale k = V RV x + V T k x = V R V + V ( R T ) e obtan k ( k k ) = ( ) V T V T s the camera moton hch s the same or all pxels; thereore, k k = and = k = k In ths ay, e represent the depth n the second mage coordnate system and then e can rene ths depth by the next par o mages consstng o the second and the thrd mages. Ths process s repeated over the entre vdeo sequence. 4.2 Plane Integraton The rst par o mages gves a plane equaton rom the domnant homography. The plane equaton s actually up to scale th the translaton parameters. Ths s the reason hy the same scale must be mantaned or the same plane n the succeedng pars n order to rene the current depth. We need to propagate the plane equaton representaton rom the rst mage coordnate system to the second one. Let N = ( a, b, c) and N = ( a, b, c ) denote the equatons o the same plane n to coordnate systems. Snce they are scaled normal drectons, N = λ R N here R s the rotaton beteen to coordnate systems and λ s the scale beteen to normal T drectons. For pont p = ( x, y, z) expressed n the rst coordnate system, e get N p = and N ( R p + T ) = N R p = N T T λ N p = λ N R T T = N R T λ λ = + N R T

λ tells the poston o the plane n the second coordnate system rom propagaton so that e can adjust the scale o the next camera moton solver to mantan the plane at the same poston. 4.3 Recovery o Focal Length The tutoral [Mohr and Trggs, 996] summarzes the projectve geometry approaches n structure rom moton, concludng that hen nternal parameters are constant three mages are enough to recover the Eucldean shape. Polleeys et al. [998] demonstrated that the ske parameter equals zero, even th varyng nternal parameters three mages are sucent to recover Eucldean shape. In our ork, e assume other nternal parameters as knon except the ocal length. Each homography has 8 parameters hch nclude normaton o rotaton (3 parameters) and translaton (3 parameters) o consecutve mages. Gven the ntal values o the rst to ocal lengths, e can obtan the domnant plane equaton rom the camera moton solver. The plane equaton s propagated to the ollong mages and can then be used to solve ocal lengths (2 parameters) rom homography n the same ay as solvng camera moton. regster consecutve mages n the vdeo sequence. Fgure 2(b) gves the derence mages beteen consecutve regstered mages. Whte dots ndcate derences hch are actually the outlers o homographes; e can observe that the ground belo the brdge as selected as domnant plane by robust estmaton. Also, e can see that both moton (movng cars) and parallax (the brdge) appear n the derence mages. Based on the homographes, e recovered projectve depth by rst mage eleventh mage (a) orgnal mages seventh mage 4.4 Applcaton to Moton Detecton Detectng movng objects n a vdeo sequence taken rom movng camera s an mportant task n vdeo sequence analyss. Some algorthms ork ell n 2D stuatons hen the scene can be approxmated by a lat surace and/or hen the camera s undergong only rotatons and zooms; some apply to the scene hen large depth varatons are present. [Iran and Anandan, 997] dscusses a uned approach handlng both 2D and 3D scenes. Our goal s moton detecton n aeral mages hle the camera experences complex ego-moton and the scene can nether be classed as lat surace nor provde sgncant depth varatons. Fgure 2(a) shos three mages n the vdeo sequence provded by the Vdeo Survellance and Montorng (VSAM) project o CMU. The sequence as taken rom an arplane lyng above a brdge; to cars ere movng on the brdge and one car as movng on the road hch s ar belo the brdge. We rst obtaned homography to rst derence seventh derence (b) derence beteen regstered mages rst depth seventh depth (c) projectve depth (darker denotes arther) rst derence ater seventh derence ater depth compensaton depth compensaton (d) moton detecton (derence beteen regstered mages ater depth compensaton) Fgure 2: Moton Detecton

temporal ntegraton over 7 mages and use that to regster consecutve mages agan. Fgure 2(c) shos the recovered depth. It can be seen that the projectve depth as mproved over sequences; the projectve depth or the seventh mage shos the scene structure ncludng the brdge n ront and the road along the gully. Ne derence mages (Fgure 2(d)) ere generated beteen regstered mages th depth compensaton. We can see that derences due to the depth are cleaned up and hte dots represent the moton only. Cars on the brdge and on the road belo are detected and tracked correctly. Hoever, n a stuaton here moton o the object alays satses the eppolar constrants, the object s classed as a statonary rgd body. 5 Concluson We have presented a rameork or homography based projectve depth recovery and ts applcaton to moton detecton. We descrbed a robust homography algorthm hch ncorporates mage contrast/brghtness adjustment and robust estmaton nto mage regstraton. Based on the homography beteen to mages, our camera moton solver gves the soluton o ego-moton and plane equaton; the soluton s rened to generate projectve depth or each pxel by the Levenburg-Marquardt method. We also dscussed temporal ntegraton o projectve depth recovery and ts applcaton to moton detecton. The encouragng temporal ntegraton results motvate us to expand ths ork to nclude spatal ntegraton as ell. Other applcaton tasks such as 3D mosackng, background model recovery and vdeo edtng are promsng areas to explore. Acknoledgements We ould lke to thank Bob Collns, Alan Lpton, Teck Khm and We Hua or rutul dscussons. Reerences [Anandan, 989] P. Anandan. A Computatonal Frameork and an Algorthm or the Measurement o Vsual Moton, IJCV 2(3), pp. 283-30, 989. [Baker et al., 998] S. Baker, R. Szelsk and P. Anandan. A Layered Approach to Stereo Reconstructon, Proc. o CVPR 98, pp. 434-44, 998. [Bergen et al., 992] J. R. Bergen, P. Anandan, K. J. Hanna and R. Hngoran. Herarchcal Model-based Moton Estmaton, Proc. o ECCV 92, pp. 237-252, 992. [Fschler and Bolles, 98] M.A. Fschler and R.C. Bolles. Random Sample Consensus: A Paradgm or Model Fttng th Applcaton to Image Analyss and Automated Cartography, Commun. Assoc. Comp. Mach, vol. 24, 98. [Iran et al., 994] M.Iran, B.Rousso and S. Peleg. Recovery o Ego-Moton Usng Image Stablzaton, Proc. o CVPR 94, pp. 454-460, 994. [Iran and Anandan, 996] M. Iran, P. Anandan. Parallax Geometry o Pars o Ponts or 3D Scene Analyss, Proc. o ECCV, 996. [Iran and Anandan, 997] M. Iran and P. Anandan. A Uned Approach to Movng Object Detecton n 2D and 3D Scenes, PAMI(20), No. 6, pp. 577-589, June 997. [Iran and Anandan, 998] M. Iran and P. Anandan, Vdeo Indexng Based on Mosac Representatons, Proc. o IEEE (86), No. 5, pp. 905-92, May 998. [Kanatan, 997], K. Kanatan. Introducton to Statstcal Optmzaton or Geometrc Computaton (Lecture Note), 997. [Lucas and Kanade, 98] B. D. Lucas, T. Kanade. An Iteratve Image Regstraton Technque th an Applcaton to Stereo Vson, Proc. o IUW, pp. 2-30, 98. [Mohr and Trggs, 996] R. Mohr and B. Trggs. Projectve Geometry or Image Analyss. A Tutoral gven at ISPRS, 996. [Polleeys et al., 998] M. Polleeys, R. Koch and L. V. Gool. Sel-Calbraton and Metrc Reconstructon n spte o Varyng and Unknon Internal Camera Parameters, Proc. o ICCV, pp. 90-95, 998. [Shashua and Werman, 995] A. Shashua, M. Werman. Trlnearty o Three Perspectve Ves and ts Assocated Tensor, Proc. o ICCV 95, pp. 920-925, 995. [Szelsk, 996] R. Szelsk. Vdeo Mosacs or Vrtual Envronments, IEEE Computer Graphcs and Applcatons, pp. 22-30, March 996. [Szelsk and Shum, 997] R. Szelsk and H. -Y. Shum. Creatng Full Ve Panormc Image Mosacs and Texturemapped Models, SIGGRAPH 97, pp. 25-258, August 997. [Torr and Murray, 997] P.H.S. Torr, D.W. Murray. The Development and Comparson o Robust Methods or Estmatng the Fundamental Matrx, IJCV(24), No. 3, pp. 27-300, 997. [Xong and Shaer, 995] Y. Xong and S. A. Shaer. Dense Structure From A Dense Optcal Flo Sequence, Proc. o ISCV 95, 995.