Multi-view Image-Based Rendering and Modeling

Size: px

Start display at page:

Download "Multi-view Image-Based Rendering and Modeling"

Elaine Logan
6 years ago
Views:

1 Mult-vew Image-Based Renderng and Modelng by Qan Chen A Dssertaton Presented to the FACULY OF HE GRADUAE SCHOOL UNIVERSIY OF SOUHERN CALIFORNIA In Partal Fulfllment of the Requrements for the Degree DOCOR OF PHILOSOPHY (Computer Scence) May 2 Copyrght 2 Qan Chen

2 Acknowledgements I frst would lke to thank Prof. Gérard Medon for beng an absolutely wonderful thess supervsor through the entre process. He started me out wth great deas and problems, provded good feedback and suggestons durng the research, and made sure that the fnal output s of hgh qualty. As well as teachng me computer vson, he showed me that good research can be fun. I thank hm for makng my tme at USC a very rewardng experence. hanks to Prof. Ulrch Neumann and to Prof. Wlodek Proskurowsk for servng on my commttee and provdng very useful comments on the thess draft. Dr. Neumann s the leader of the Computer Interfaces thrust n the Integrated Meda Systems Center whch ultmately makes my research possble, academcally and fnancally. I am grateful to Prof. Proskurowsk for clarfyng some of the ssues n numercal optmzaton. hanks also to Prof. Ram Nevata for hs nterest n my research and many casual dscussons conducted n the IRIS hallway. I receved help from several other members of the IRIS group and the CGI group durng varous stages of ths work. In partcular, thanks to Alexandre Francos for beng my lab mate durng the years and answerng questons whenever I have one; to Reyes Encso for the valuable dscussons about stereo and, especally, face reconstructon. I deeply apprecate my wfe and my lovely daughter for patently supportng me throughout ths long academc journey. hanks also to my parents and the rest of my famly who are all excted about "one of them" gettng a doctorate.

3 ABLE OF CONENS ACKNOWLEDGEMENS.... LIS OF FIGURES... v LIS OF ABLES x ABSRAC x CHAPER INRODUCION.... WHA IS IMAGE-BASED RENDERING AND MODELING....2 ISSUES OBJECIVE AND APPROACH OULINE AND NOAIONS...8 CHAPER 2 ELEMENS OF PROJECIVE GEOMERY PROJECIVE SPACE Defntons Canoncal affne space embeddng and the plane-at-nfnty Collneaton Dualty Propertes of P Cross-rato PROJECION VS. RECONSRUCION Projecton the forward process Reconstructon the nverse process Canoncal forms of the projecton matrces ABSOLUE QUADRIC AND ABSOLUE CONIC...7 CHAPER 3 RELAED WORK SRUCURE-FROM-MOION RELAED WORK IN COMPUER GRAPHICS Lght-feld renderng Image mosacng Image warpng RELAED WORK IN COMPUER VISION Eppolar geometry rfocal tensor Quad-lnearty Projectve reconstructon Self-calbraton HYBRIDMEHODS SUMMARY...3

4 CHAPER 4 IBRM FROM WO VIEWS REVIEW OF RELAED WORK Aeral trangulaton Bnocular stereo EPIPOLAR GEOMERY IN DEAIL RECIFICAION IMAGE MACHING he u-v-d volume Dsparty surface extracton Extracton on the Hervé s Face stereo par Improvng tme effcency RECONSRUCION Projectve reconstructon by matrx factorzaton Eucldean reconstructon RESULS Pentagon par Renault Automoble Part Herve s Head A full head example A FACE RECONSRUCION SYSEM...58 CHAPER 5 MULI-VIEW PROJECIVE RECONSRUCION FACORIZAION-BASED MEHODS Factorzaton-based reconstructon for affne cameras Extenson to perspectve cameras he Iteratve Factorzaton Algorthm (IFA) RANK HEOREM UNDER HE COMMON-IMAGE-PLANE CONDIION A GEOMERIC EXPLANAION IERAIVE MEHODS Mohr's non-lnear mnmzaton method Our approach Smulaton results EXPERIMENAL RESULS ON NOISY DAA Experments on synthetc data Experments on real data FINAL REMARKS...9 CHAPER 6 APPLICAION OF PROJECIVE RECONSRUCION VIEW SYNHESIS REVIEW OF RELAED WORK Vew morphng Vew synthess Vew synthess n tensor space SYSEM OVERVIEW... v

5 6.3 WARPING FORMULAION Vertex mappng rangle fllng Fundamental matrx computaton ADDRESSING PARIAL OCCLUSION ADDRESSING OAL OCCLUSION RESULS SOME REMARKS...9 CHAPER 7 MULI-VIEW EUCLIDEAN RECONSRUCION PDM ESIMAION Algorthm descrpton Relaton to exstng methods Intalzaton SELF-CALIBRAION SIMULAION RESULS ON REAL IMAGES APPLICAION O OBJEC INSERION Implementaton ssues...26 CHAPER 8 CONCLUSION AND FUURE WORK...29 REFERENCES. 32 APPENDIX A MARIX DECOMPOSIION...4 v

6 LIS OF FIGURES Fgure. Computer graphcs the forward problem... Fgure.2 Computer vson the backward problem...2 Fgure.3 IBRM brdgng the gap...3 Fgure.4 he overall system...7 Fgure 4. Aeral rangulaton and Strpe Adjustment...34 Fgure 4.2 Bundle Adjustment n photogrammetry...34 Fgure 4.3 Eppolar geometry...37 Fgure 4.4 Illustraton of the rectfcaton algorthm...39 Fgure 4.5 he orgnal and the rectfed stereo par...4 Fgure 4.6 Pseudo code of the tracng algorthm...45 Fgure 4.7 Results on Hervé s Face stereo par...46 Fgure 4.8 Output dsparty map of the Hervé s Face par...47 Fgure 4.9 Results of the Pentagon par...53 Fgure 4. Results of the Renault Auto Part par...54 Fgure 4. Results on the Herve s Head par...55 Fgure 4.2 Reconstructed full head model...57 Fgure 4.3 Results from tensor-votng...58 Fgure 4.4 Our stereo system made from two FUJI DS Fgure 4.5 Examples and results from the Face System...6 Fgure 4.6 More examples and results...6 Fgure 5. Illustraton of IFA...67 v

7 Fgure 5.2 Pseudo code of IFA...68 Fgure 5.3 he common-mage-plane condton...73 Fgure 5.4 Average reconstructon error of the CIL- dataset...76 Fgure 5.5 Illustraton of WIE...82 Fgure 5.6 Pseudo code of WIE for projectve reconstructon...82 Fgure 5.7 he planar confguraton...83 Fgure 5.8 he sphercal confguraton...84 Fgure 5.9 Convergence of WIE...85 Fgure 5. he fve largest sngular values n teratons...86 Fgure 5. Comparson of convergence rate...87 Fgure 5.2 Reprojecton error vs. nose level...9 Fgure 5.3 he termnal sequence...93 Fgure 5.4 he wooden house sequence...94 Fgure 6. Illustraton of the vew synthess approach...97 Fgure 6.2 Illustraton of trfocal tensor...99 Fgure 6.3 System flow-chart... Fgure 6.4 Homography between two mage planes...4 Fgure 6.5 Mappng of -junctons...5 Fgure 6.6 Vsblty compatble order...7 Fgure 6.7 Handlng occluson... Fgure 6.8 Syntheszed vews of a computer montor... Fgure 6.9 A fuller synthess... v

8 Fgure 6. Lookng nto a room an extrapolaton example... Fgure 7. Locus of prncpal pont n 2 experments...2 Fgure 7.2 Relatve errors vs. nose level (σ)...2 Fgure 7.3 Relatve errors vs. vew separaton (θ)...22 Fgure 7.4 Relatve errors vs. number of vews (N)...22 Fgure 7.5 Reconstructon results of the termnal sequence...24 Fgure 7.6 wo vews of the reconstructed house...24 Fgure 7.7 Msalgnment of real and vrtual objects...27 Fgure 7.8 Correct regstraton after error compensaton...28 v

9 LIS OF ABLES able. Notaton table...9 able 2. Recovered relatve camera movement (mm)...77 able 3. Comparson over 4 vews the termnal sequence...93 able 4. Comparson over 8 vews the termnal sequence...93 able 5. Results on the house sequence...94 able 6. Average Eucldean reconstructon error on real data...24 x

10 Abstract Whle both work wth mages, Computer Graphcs and Computer Vson are dfferent felds. Computer graphcs starts wth the creaton of geometrc models and produces mage sequences. Computer vson starts wth mages or mage sequences and produces nterpretatons ncludng geometrc models. Lately, there has been a meetng n the mddle, the goal beng to create photorealstc mages wth the help of accurate models recovered from projectons of real objects. hs convergence has produced a new subfeld called Image-Based Renderng and Modelng (IBRM). In ths research, the geometrc aspects of IBRM are studed. he case of two vews s studed frst. New algorthms are developed that automatcally algn the nput mages, match them and reconstruct 3-D surfaces n Eucldean space. he matchng algorthm s desgned to cope wth complex shapes such as human faces. he reconstructon algorthms are then generalzed to the mult-vew case, based on a stratfed framework. At the root of the stratfcaton s a novel projectve reconstructon algorthm that produces a non-metrc structure of the scene. An mage-based renderng system s mplemented, that drectly uses ths non-metrc structure to synthesze mages from novel vewponts. On top of that, a novel algorthm s developed to upgrade the non-metrc structures nto metrc ones. Intrnsc and extrnsc camera parameters are obtaned at the same tme. One potental applcaton, showcased n ths thess, s to blend computer graphcs objects nto real mages wth correct perspectve and occluson. he proposed theory and the assocated algorthms lay down the groundwork for future development n mult-vew mage based renderng and modelng. x

11 Chapter Introducton. What s Image-Based Renderng and Modelng radtonally, computer graphcs and computer vson are consdered to be nverse problems. Workng n the forward drecton, computer graphcs begns wth the creaton of geometrc models. hese models are subsequently tessellated, colored (or textured), lghted and projected onto a computer screen, producng mages (Fgure.). herefore, much of the efforts n graphcs research have been devoted to the effcent representaton, storage and renderng of geometrc models. he problems wth graphcs are that, no matter how detaled the models can be, and how fast renderng can be performed, the creaton tme s stll tme consumng, and the fnal mages may look artfcal. Computer Graphcs Image Model Synthetc Camera Fgure. Computer graphcs the forward problem

12 Workng n the backward drecton, computer vson tres to replcate the human vson process on computers producng hgh-level nterpretatons from real mage sequences (Fgure.2). Often, a layered approach s taken: In early vson, spatal and temporal features are detected. Features are then grouped accordng to some smlarty measure. For example, nearby and parallel edge segments are grouped nto lnes; connected and perpendcular lnes form rectangles. Groups are aggregated nto objects. Once objects are formed, ntellgent tasks such as recognton can be performed. he dffculty of such an approach (and of computer vson n general) s that these layers are nterdependent. Whle achevng hgh-level objectves reles on solvng low-level problems, the avalablty of hgh-level knowledge certanly makes the task of solvng low-level problems easer and more effcent. hus, a robust vson system should both top-down and bottom-up. Such a system has yet to be successfully demonstrated. Computer Vson Symbolc Descrpton Images by Real Cameras Dog? Camel? Eatable? Interpretaton Fgure.2 Computer vson the backward problem 2

13 Recently, there has been a convergence of research to brdge the gap between the two felds. Lookng n the drecton of computer vson, we see a tendency of amng at the recovery of geometrc models, branchng off the orgnal ntent of obtanng symbolc or semantc descrptons. Lookng n that of computer graphcs, we see more and more applcatons that demand photorealstc results. At the ntersecton s the emergence of a new sub-feld called Image-Based Renderng and Modelng. IBRM represents the effort of usng the geometrc and photometrc nformaton recovered from real mages to generate new mages wth the hope that the syntheszed ones appear photorealstc, as well as reducng the tme spent on model creaton. It tres to leverage the ease wth whch photographs or vdeos are taken and ther amazng power to communcate. In fact, our eyes are so fne-tuned to texture and color nformaton that much of the naccuracy present n the recovered data goes unnotced. Image Based Renderng/Modelng Image-Based Renderng Real Images Syntheszed Image Image-Based Modelng Fgure.3 IBRM brdgng the gap 3

14 .2 Issues here s a full spectrum of exstng research works tacklng the IBRM problem, rangng from those based on structure reconstructon, to model-free ones. In between are those whch make use of both geometrc and photometrc nformaton, thus possess some of the advantages of the two extremes. In analyzng these dfferent approaches, we have notced the followng ssues. Image understandng vs. structure recovery Despte tremendous progress n vson research n the past a few decades, what has been acheved n the mage understandng feld as the front end of an ntellgent system remans lmted. Part of the reason s that there s strong ndcaton of nteracton between hgh and low level nformaton processng n the human vson process. ake edge detecton as an example. At the lower level, edges consst of pxels wth large ntensty changes; at the hgher level, they are the boundares of objects. he correct way to perform edge detecton therefore s to couple t wth other hgh-level operatons such as segmentaton and recognton. Unfortunately, ongong research efforts have not been successful n modelng ths knd of nteracton. If the purpose s to recover structures, especally for graphcs usage, the stuaton becomes qute dfferent. Frst, n most cases, t s unnecessary to understand the mages. Second, when t becomes necessary, a human operator s allowed to provde the hghlevel nformaton, whle the computers perform all the low-level computatonal work. 4

15 Structure recovery vs. mage warpng he tradtonal way of recoverng structures undergoes two steps. Frst, one calbrates the cameras, meanng that both ther nternal parameters and external poses are determned. Second, correspondences among mages of scene features are establshed. From these two peces of nformaton, t s possble to compute 3-D Eucldean coordnates of the nvolved enttes. Whle ths process s seemngly dong the rght thng n the sense that we lve n a three-dmensonal Eucldean world, t has some undesred propertes. For nstance, calbraton s a tedous procedure, often performed off-lne wth some mechancally measured ground truth data. In addton, t s only vald n a certan range. he accuracy degrades as the real envronment departs from the orgnal calbraton setup. On the other hand, f all one wants are mages rather than models, and f there are ways to warp exstng mages drectly, why bother to create models (and to calbrate the cameras) at all? hs s exactly the dea that prompts the advent of mage-based renderng. It has been demonstrated by many authors (Chen and Wllams [7], Havaldar, Lee and Medon [43], McMllan and Bshop [6], Setz and Dyer [8]) n the graphcs communty that, at least n some specal cases, mage warpng s possble usng only the correspondence nformaton. Eucldean vs. projectve reconstructon A smlar problem has also been rased n the vson communty: s t an overkll to try and reach Eucldean reconstructon all the tme? he way computer vson scentsts 5

16 approach the problem s to fnd a less restrctve (compared to Eucldean) structure n the hope that t requres less data to compute, and stll serves many useful applcatons. In hs poneerng work n ths area, Faugeras [26] showed that, from correspondence data only, he could recover the projectve structure of the scene (thus projectve reconstructon) whch s dfferent from the Eucldean one by an unknown 4 4 transformaton. Later, Chen and Medon [2], Avdan and Shashua [4] ndependently demonstrated that such a structure s suffcent for mage warpng. he result s same as f the mages were projected from Eucldean structures. In [3] and [3], Faugeras et al. further showed that a projectve structure can be upgraded nto a Eucldean one gven more nformaton about ether the scene or the cameras..3 Objectve and Approach he goal of ths research s to develop new algorthms for mult-vew mage based renderng and modelng applcatons. he problems nvolved have dual appearances n computer vson and computer graphcs, whch determnes that ther solutons can be looked at from ether vewpont, and are lkely to fnd applcatons n both felds. We approach a vson problem from a graphcs perspectve we perform structure recovery rather than mage understandng; smultaneously, we address a graphcs problem usng computer vson technques mage warpng based on projectve reconstructon, and modelng based on Eucldean reconstructon. By developng algorthms, we solve some of the fundamental problems n computer vson, n partcular, structure-from-moton. At the same tme, we demonstrate how these algorthms can be appled n graphcs. 6

17 Vew Vew 2 Vew n Matchng Correspondences Projectve Reconstructon Projectve (non-metrc) Structure Vew Synthess Eucldean Reconstructon Eucldean (metrc) Structure Face Reconstructon Object Inserton Fgure.4 he overall system 7

18 he overall system s depcted n Fgure.4 where we see a stratfed structure: startng from a set of mages of a (statc) scene, correspondences are frst establshed, automatcally or wth human assstance; projectve (or non-metrc) structures are then computed and subsequently, transformed nto Eucldean (or metrc) ones. Each stratum supports certan applcatons, dependng on the underlyng mathematcal problems to be addressed..4 Outlne and Notatons Chapter 2 ntroduces some prelmnary concepts from projectve geometry whch are gong to be referenced n many places throughout ths thess. hey are helpful n understandng the materals n ths thess. Chapter 3 revews related work n both the computer graphcs and computer vson arenas. In Chapter 4, a complete two vew IBRM system s presented. hen, n Chapter 5, the projectve reconstructon algorthm developed prevously s generalzed to multple vews. Its applcaton s demonstrated n Chapter 6 wth a vew synthess system. Chapter 7 extends the Eucldean reconstructon algorthm to multple vews, and demonstrates ts usage by an applcaton whch blends graphc objects nto real mages. Both Chapter 5 and Chapter 7 nclude expermental results from smulaton and on real datasets. Chapter 8 concludes the thess and dscusses future work. Several forms of matrx decomposton are used n ths thess. hey are summarzed n the Appendx for the purpose of easy reference. Mathematcal notatons are used as follows (able ). Italc lower case characters (s) represent scalars. Italc upper case characters (M) represent matrces. Bold and talc 8

19 lower case characters (v) represent one-dmensonal arrays whch can be ponts, vectors or columns/rows of matrces. Unless otherwse stated, the default assumpton s that coordnates are wrtten n column format. Under ths conventon, the dot product of two vectors a and b s a b, and the tensor product s ab. When a subscrpt s attached, t represents the component of the orgnal quantty ndexed by the subscrpt. For nstance, let t be a 3-tuple vector, then t can be wrtten as t t = t t 2 3. As another example, f M s a 3x3 matrx, then m 22 s the center element; and M can be wrtten ether n ts column vectorsas[m (),M (2),M (3) ], or n ts row vectors as M M M () (2) (3). Fnally, bold and talc upper case characters (C) represent sets. able. Notaton table Notaton Meanng s scalar v pont, vector a b vector dot product ab vector tensor product a b vector cross product M matrx M () M ) th column of matrx M ( th row of matrx M S set 9

20 Chapter 2 Elements of Projectve Geometry hs chapter ntroduces some elementary projectve geometry that wll be used n ths thess. he ntroducton s ntended to be nformal but nformatonal. For further study, Faugeras' monograph [28] s a good place to start. More advanced topcs are dscussed n hs paper [3]. he tutoral by rggs and Mohr [63] s qute thorough n terms of descrbng general concepts. he tutoral by Hartley and Zsserman [99], on the other hand, concentrates on the computatonal aspects of mult-vew geometry. Fnally, any textbook about projectve geometry should also be helpful. 2. Projectve Space 2.. Defntons Gven a coordnate system, an n dmensonal real affne space s the set of ponts parameterzed by the set of all n component real column vectors [x,,x n ] R n.ann dmensonal Eucldean space, denoted as E n, s a subspace of R n such that t s nvarant to rgd transformaton. he ponts of real n dmensonal projectve space P n can be represented by n+ component real column vectors [x,,x n+ ] R n+, wth the provsos that at least one coordnate must be non-zero and that the vectors [x,,x n+ ] and λ[x,, x n+ ] represent the same pont of P n for all λ. he x s are called the homogeneous coordnates of the projectve pont. he concept of a pont n projectve space thus

21 descrbes an equvalence relatonshp: X~Y ff λ, such that X=λY where λ R, X,Y P n. hs relatonshp, denoted by ~, s referred to as equalty up to a scale. he set of all ponts n P n whose coordnates satsfy a lnear equaton a x +,+a n+ x n+ =,wherea R for all =,, n+ s called a hyper-plane. Obvously, a hyper-plane n projectve space can also be represented by an n+ component vector [a,,a n+ ]. Conventonally, hyper-planes are wrtten as row vectors. When n=2, the hyper-plane degenerates nto a projectve lne: [a, a 2, a 3 ] Canoncal affne space embeddng and the plane-at-nfnty An affne space R n can be embedded somorphcally n P n by the standard njecton: [x,,x n ] [x,,x n,]. Conversely, affne ponts can be regenerated from projectve ones wth x n+ bythe mappng x xn x xn [,..., x ] ~,...,, x n+,...,. xn+ xn+ xn+ xn+ Under such an embeddng, any projectve pont wth x n+ = corresponds to a vanshng pont n the n dmensonal affne space. he set of all such ponts forms the Plane-at- Infnty, denoted by Π. It should be ponted out that the above mappngs and defntons smply consttute a conventon. hey are only meanngful f we are told n advance that [x,,x n ]represents a normal affne space and x n+ s a specal homogenzng coordnate. In a projectve space,

22 any one coordnate can act as the homogenzng coordnate, and all hyper-planes are equvalent none s specally sngled out as the Plane-at-Infnty. However, the choce for R n and Π must be consstent n a way such that P n ~ R n + Π holds Collneaton Afullrank(n+)x(n+) matrx defnes a lnear transformaton or collneaton, fromp n to tself. A collneaton s also defned up to a scale: f A=ρB, then X P n, AX~BX. Hence the symbol ~ also apples to collneatons Dualty Hyper-planes and ponts are dual enttes whch satsfy the followng Dualty Prncple: For any projectve result establshed usng ponts and hyper-planes, a symmetrcal result holds n whch the roles of ponts and hyper-planes are nterchanged: ponts become planes, and vse versa. For nstance, the equaton x x [ a,..., a ]... = A n+ X = n+ has two dfferent but symmetrc nterpretatons: a pont set where the ponts are coplanar (fxng A), or a plane set where the planes are concdent to a pont (fxng X). he dualty determnes that a hyper-plane has dual representatons: an equaton n x 's, f treated as a 2

23 pont set; or an n-tuple row vector as a pont n the dual pont space. Note that under the canoncal affne embeddng, the dual representaton of Π hastheform[,,,] whch s called the canoncal form of Π. Unless mentoned otherwse, canoncal representatons are used for varous enttes throughout ths thess. Let H be a collneaton for ponts. From the prevous equaton, we have x a n+ =. x n+ ([,..., a ] H ) H... Hence H - ~H * s a collneaton for hyperplanes where H * s the adjont matrx of H. In other words, H and H * are dual collneatons. A set of hyper-planes ncdent to a common lne s named a pencl-of-planes. he dual structure of a pencl-of-planes s a set of co-lnear ponts. If two of the hyper-planes are parallel, ther ntersecton s at Π, thus all hyper-planes n the pencl ought to be parallel to one another, otherwse at least one par would ntersect wthn the affne space Propertes of P 2 In 2-D projectve space P 2, the set of vanshng ponts form the Lne-at-Infnty, denoted as L. he canoncal form of L s: (,,). Its dual form s: x 2 =. he collneaton n P 2 s hstorcally called homography. Snce a homography s a 3x3 matrx defned up to a scale, there are 8 ndependent parameters. o compute t, four pont correspondences suffce. Dually, four lne correspondences suffce the soluton of a dual homography. 3

24 A set of concdent lnes n P 2 s called a pencl-of-lnes. From the Dualty Prncple, we know that a pencl-of-lnes s a projectve -D structure. If two lnes are parallel, all lnes are parallel to one another because ther common ntersecton s on the Lne-at- Infnty Cross-rato Let A, B, C, D be four co-lnear ponts n a projectve space, the cross-rato s defned as cross(a, B; C, D) = AC AB BD CD. where denotes the dstance between two ponts. If one of the four ponts s a vanshng pont, e.g. D, then cross(a, B; C, D) = AC AB. Cross-rato s nvarant to projectve transformatons. In projectve space, three ponts form a bass for a lne: gven such a bass, a fourth pont s unquely determned by a cross-rato. he cross-rato of a pencl-of-planes (lnes) s defned as that of ts dual structure. 2.2 Projecton vs. Reconstructon 2.2. Projecton the forward process A perspectve camera projecton s represented by a 3x4 matrxp: P 3 P 2 u v ~ R X X [ A ] = [ AR A] = P 3 3 x y z 3 4 4

25 where (u,v) are the mage coordnates of the Eucldean world pont [x yz], A represents ntrnsc camera parameters, and R, represent the rgd transformaton from the world coordnate system to the camera coordnate system. o be able to wrte the projecton process n such a nce lnear form s the prmary reason for treatng the mage plane and the world as projectve spaces. Notce that P ncludes a 3x3 part that s related to the camera's orentaton and a 3x part that s related to ts poston. When the projecton center of a camera s on the ntersecton lne of a pencl-ofplanes, the mage of the sad pencl-of-planes s a pencl-of-lnes. he two have the same cross-rato snce t s preserved under camera projecton Reconstructon the nverse process Gven the correspondence nformaton among the mage coordnates of n ponts n m vews: (u j,v j )where=,, m and j=,, n, t s known that the projecton matrces of the cameras P and the homogeneous coordnates of the ponts X j can be computed satsfyng uj v j = P X j. hs process s called projectve reconstructon. And the set of all reconstructed ponts forms a projectve structure. Such a structure s not unque: for any non-sngular 4x4 matrx H, P X = ( P H)( H X ) = P X. In other words, all reconstructed structures are j j j equvalent up to a 4x4 collneaton. Snce collneaton only preserves cross-rato, projectve structures are non-metrc. Among all the equvalents, those whch exst n 5

26 Eucldean spaces are Eucldean structures. All Eucldean structures are equvalent up to a rgd transformaton. Gven a Eucldean structure Y, and an equvalent projectve structure X, of the same set of ponts, there exsts a collneaton H such that for y Y, H y X. H s called a Projectve Dstorton Matrx. From ths pont of vew, gven a projectve structure X, Eucldean reconstructon amounts to fndng the Projectve Dstorton Matrx H because, by defnton, H - X must be a Eucldean structure Canoncal forms of the projecton matrces Let P, =,, m, be the set of projecton matrces resulted from a projecton reconstructon. Wrte P nto a 3x3 anda3x sub-matrx [M t ]. he frst canoncal form looks lke: [I ], [A 2 a 2 ],, [A m a m ]. he geometrcal explanaton s that the (updated) world coordnate system s concdent wth that of the frst camera. hs form s obtaned by rght multplyng each (orgnal) projecton matrx by M - M t M I t H = =. At the same tme, the projectve structures must be updated by left multplyng wth H I = t M. We can further rotate the coordnate system so that the projecton center of the second camera (denoted by O 2 ) s located on the x-axs, gvng the second canoncal form. Wrtng the rotaton matrx as R, theno 2 = R A2 a 2 = R a 2.ImposngO 2 on the 6

27 7 x-axsmplesthattsy and z components must vansh. One can choose R 2 and R 3 (the 2 nd and 3 rd columns of R) as two orthogonal unt vectors from a plane perpendcular to 2 a, and R the cross-product of the prevous two. In wrtng, the second canoncal form appears as [B ], [B 2 b 2 ],, [B m b m ], whch conceals the underlyng geometrcal sgnfcance, however. 2.3 Absolute Quadrc and Absolute Conc Any symmetrc 4 4 matrxq unquely defnes a non-empty quadrc: x Qx=. p =x Q s the tangent plane to Q passng through x. Obvously, p satsfes p Q * p= when Q * s nondegenerate, whch s the dual structure of Q. he absolute quadrc Ω s a degenerate quadrc whose rank s 3. Under a pont transform M: x Mx, Ω s transformed by MΩM. Under a camera projecton J, Ω s mapped to JΩJ. he canoncal form of Ω n a Eucldean space (wth canoncal affne embeddng) s I. Let R and represent a rgd transformaton, then = = I RR R I R, whch means that Ω s nvarant to rgd moton. Let [ ] = R A P be a camera projecton matrx, then [ ] [ ] AA A I A A R I R A = =,

28 or, shortly, ω=pωp =AA. he dual of Ω s not a set of planes, but a conc curve on Π called the absolute conc, denoted as Γ. Γ has the same canoncal form as Ω : for a pont X=[x ] on Γ, I x [ x ] = x x = whch only has vrtual roots. he projecton of Γ s descrbed by R x u C = PX = [ A ] = ARx. hus A u = C Rx. And further, u A A u = u ω u = x R Rx = x x.inother C C C C = words, the mage of Γ sa2-dconccurveω - whch s the dual mage of Ω. he mplcaton here s that the nternal camera parameter matrx s related to and can be computed from the mage of the absolute conc (or the dual mage of the absolute quadrc), usng Cholesk decomposton (Appendx A.2). 8

29 Chapter 3 Related Work o a certan degree, the effort on mage-based modelng started at least twenty years ago when the structure-from-moton (SFM) problem (.e. obtanng structural nformaton from the mages of a movng camera) was approached by computer vson scentsts. Unfortunately, up untl now, successful vson systems reman uncommon. here are many ssues nvolved, some of them were just dscussed n secton.2. Lately, such a queston has been rased: rather than tryng to address all those ssues, can the requrement on structure recovery be reduced? hs smple queston has trggered a lot of studes on the topc of non-metrc structure-from-moton (NSFM). In the graphcs sde, almost at the same tme, as a way to gan photorealsm, magebased-renderng has been expermented whch generates new mages by warpng and combnng extng mages. It was soon realzed that non-metrc structure s suffcent for mage warpng. hen, methods whch convert non-metrc structures nto metrc ones were also reported. Algorthms developed for attackng these problems form a theoretcal foundaton for Image-Based Renderng and Modelng. It s ths common nterest that brngs both vson and graphcs researchers nto ths newly formed feld. 3. Structure-From-Moton he lterature under ths topc s qute extensve. Here, only three approaches are revewed, whch suffce to demonstrate the dversty of research n ths area. 9

30 In Szelsk et al. [87], an object s placed on a turntable whose rotaton angle s read from marks on the sde. Shape s recovered n term of a sequence of slhouette curves. he advantage s that the dffcult correspondence problem s avoded. hs approach, however, reles on knowng the rotaton angle, whch largely lmts ts applcatons. In feature-based approaches, dstngushed features such as ponts (Pollefeys et al. [72]), edges (Beardsley et al. [8]) and regons (Ma et al. [57]) are dentfed n the frst mage and then tracked along the mage sequence. Compared to matchng pxels between two well-separated mages, trackng features across temporally dense samples s relatvely easy. he trade off s that the resultng pont set s sparse. Consequently, these approaches are sutable to blocky objects rather than smooth ones. he approach taken by Pghn et al. [69] s to deform a generc model to ft the recovered data usng scatter data nterpolaton. he system s tuned for human faces, n whch case common knowledge tells where more ponts are needed to cover hgh curvature areas. Correspondence n ths system s establshed nteractvely. In the thrd category, optcal flow technques are used to generate a dense correspondence map (e.g. Pollefeys et al. [7]). Optcal flow computaton s one of the fundamental problems encountered n processng mage sequences and has been extensvely studed n computer vson. he goal s to compute an approxmaton to the 2- D moton feld projecton of the 3-D veloctes of surface ponts onto the magng surface from spatal-temporal patterns of the mage ntensty. It s argued n [3] (Alomonos and Brown) and [96] (Verr and Poggo) that estmaton of the optcal flow s an ll-posed problem due to nherent dfferences between 2-D moton feld and ntensty 2

31 varatons. hs s reflected n the survey paper [7] by Barron et al. wherenoneofthe technques can consstently produce low error rate and hgh densty n all testng cases. 3.2 Related Work n Computer Graphcs 3.2. Lght-feld renderng heoretcally, the only way to recreate true photo-realsm s to sample and then reconstruct the 6-D plenoptc functon (Adelson et al. []): f(λ, ϕ, θ, x, y, z). In other words, t s necessary and suffcent to sample lghts of all wavelengths (λ) fromall drectons (ϕ, θ) at all locatons (x, y, z).herewllbealongtmebeforethestorage technology allows us to do ths. However, t s possble to contrve an envronment where some of the parameters are constant. hs s demonstrated by the work of Gortler et al. [35] and Levoy et al. [53] where only a 3-D sub-functon s consdered (nternally, a 4-D representaton s used). he dea s to frst buld some form of lookup table by takng mage samples of an object from many camera vewponts, tryng to record all outgong lghts, thus the name lght feld. Subsequently, the mage assocated wth an arbtrary vewpont s syntheszed by nterpolatng the stored lookup table. he advantage of ths class of methods s that, unlke all other methods, pxel correspondence s not necessary. he prmary dsadvantage, as ndcated earler, s the requrement of a large set of samples to produce the lookup table. For that reason, currently, the range of feld of vew s relatvely small. 2

32 3.2.2 Image mosacng Rather than sample all lghts emttng out of an object, methods have been proposed to capture lghts comng nto the observer by rotatng a camera around a fxed pont as close as possble to the camera center. Multple so obtaned mages are sttched nto one larger or hgher resoluton mage panoramc mage whch allows vews wthn t to be quckly generated. he mathematcal foundaton (see [88] by Szelsk for a formal descrpton) for mage mosacng s that dfferent mages of non-planar objects obtaned from a fxed-locaton camera are related by homographes. When the camera dsplacement s small, the pxel correspondences, and the homographes can be solved concurrently usng optcal flow technques [8]. Whle rectlnear panoramc mages are convenent representatons for a relatvely small feld of vew (less than 9 ), they are problematc for very wde scenes. In such crcumstance, ether cylndrcal (Chen [8], McMllan et al. [6]) or sphercal representatons (Szelsk et al. [89]) are more sutable. Chen s work dd not support vew nterpolaton. A vewer has to hop from one node to the next. On the other hand, at each node, pan-tlt-zoom operatons are allowed. hat of McMllan et al. extended the concept of eppolar geometry for planar mages to cylndrcal mages. her method was able to synthesze vrtual cylnders, thus generatng panoramc mages at vrtual vewponts. he drawback of usng a cylndrcal panoramc mage s ts nablty to nclude parts of the scene at the top and bottom. hs defcency has been overcome n systems that output sphercal mages such as the one by Szelsk and Shum [89]. 22

33 3.2.3 Image warpng hs class of technques s characterzed by the use of a relatvely small number of mages wth the applcaton of geometrc constrants to reproject mage pxels approprately at a gven vrtual camera vewpont. Sometmes, t s also called mage transfer. One early paper whch n some sense ntated ths whole subfeld was by Chen and Wllams [7] where only synthetc mages were consdered. he warpng functon s smply the downscaled offset vector map (a.k.a. optcal flow fled) whch s precomputed from the known camera pose and range data. he depth of each pxel s cached when the frame s rendered the frst tme, whch also helps n resolvng the vsblty. Holes areas where the pxels do not have pre-mages are flled by nterpolatng the offset vectors of adjacent pxels. In Setz and Dyer [8], two real mages are taken. hey are pre-warped to a common vrtual mage plane. hs step guarantees that any lnear nterpolaton of the pre-warped vews s shape preservng, meanng the syntheszed vew s free of non-lnear dstorton. Holes are flled usng morphng technques. he method does not extend to multple vews, however, because the pre-warpng algorthm only works for two nput vews. In addton, the vrtual camera's movement s lmted to stay n between the model vews a result of usng lnear nterpolaton. Arbtrary camera placement was consdered by Havaldar et al. [43] by basng mage transfer on projectve nvarant. However, agan, only two vews were dealt wth. hs work was broadened n Chen and Medon [2] to allow multple vews, whch n 23

34 addton addressed the occluson problem. Another mert of the latter work s that, unlke other warpng methods, t s polygon based, thus able to take advantage of hardware acceleraton n mplementaton. 3.3 Related Work n Computer Vson 3.3. Eppolar geometry Non-metrc structure from two vews was studed by Longuet-Hggns a long tme ago [54]. He dscovered what s now known as the eppolar geometry. Consder two cameras and a pont n the scene. he scene pont forms a plane wth the lne connectng the two projecton centers. As the pont moves n the world, t sweeps out a pencl-of-planes. he eppolar geometry relates the two pencl-of-lnes whch, as recalled from secton 2.2., are mages of the sad pencl-of-planes. Algebracally, the relatonshp s descrbed by a 3x3 matrx of rank 2 called the fundamental matrx. Intutvely, the rank 2 condton comes from the fact that a pencl s a projectve -D structure. Due to the rank 2 condton, seven pars of pont correspondences suffce to produce a non-lnear soluton to the fundamental matrx. Alternatvely, there are lnear solutons for eght or more ponts (e.g. Hartley [4]). he dffculty of eppolar geometry estmaton les n that matchng ponts and computng the fundamental matrx must proceed concurrently. A wdely referenced algorthm s the one by Zhang [] for ts robustness. But stll, t s not bulletproof. A thorough dscusson about the related computatonal ssues can be found n Derche et al. [23] and orr et al. [93]. 24

35 he fundamental matrx was the tool used to acheve pre-warpng for vew morphng n [8]. he same procedure was used n computer vson to algn two stereo mages, referred to as rectfcaton (e.g. Hartley et al. [39] and Robert et al. [75]). Eppolar geometry can also be thought of as a constrant that bnds correspondng pxels on the two mage planes, and can be used, for nstance, n mage matchng. However, the eppolar constrant s nherently ambguous: all ponts on a plane belong to the pencl-of-planes are projected to the same lne n ether vew. On ths lne, t s mpossble to dstngush those ponts that happen to have the same ntensty value. herefore, n practce, more reference vews are needed to help resolve the ambguty rfocal tensor Recently, Shashua [83] dscovered a trlnear constrant that nvolves both ponts and lnes n three vews. he constrant can be elegantly represented by a 3x3x3 tensor. he tensor contans the set of coeffcents of certan trlnear relatonshps that vansh on any correspondng trplet (any combnaton of ponts and lnes). A robust algorthm to compute the trfocal tensor s gven by orr [92]. he trfocal tensor s more general than the eppolar geometry n the sense that () the three fundamental matrces for the three pars of vews can be obtaned from the coeffcents of the tensor; () both pont and lne transfer are allowed. When used as a matchng constrant, the trfocal tensor s less lkely to ncur the ambguty problems. As a result, reconstructon algorthms based on t are expected to be more accurate. 25

36 For two vews, a specal tensor composed of the elements of the fundamental matrx can be constructed by consderng the thrd vew as beng concdent wth one of the gven vews. A rgd transformaton can be appled to the tensor. he new tensor plus the orgnal vews suffce to synthesze a novel vew consstent wth the transformaton [4] (Avdan and Shashua' 97). he advantages assocated wth the trfocal tensor do not come wthout a prce the constrant s algebracally over parameterzed: there are 27 coeffcents whle three vews can at most contrbute 8 ndependent parameters. akng nto consderaton the overall scale factor, eght addtonal constrants are needed n order to form a numercally stable soluton. However, up to now, only 4 of them have been dentfed. Compared to estmatng the fundamental matrx, the computaton s more nvolved snce there are many more coeffcents to estmate. he lack of a more common notaton than the tensoral one also makes t less appealng to the graphcs communty Quad-lnearty Investgatons conducted by Faugeras et al. [29] and rggs [94] have establshed the exstence of a knd of quad-lnear form wth a total of 8 coeffcents across four vews, wth the negatve result that further vews would not add any new constrants. hs form of quad-lnearty, however, s somewhat unsettlng because of the number of coeffcents has rsen to 8, whereas a vew adds at most 2 parameters. In other words, t may be too redundant a representaton of constrants over four vews, and therefore may only be of theoretcal nterest. 26

37 3.3.4 Projectve reconstructon he dlemma n eppolar geometry or trfocal tensor based reconstructon algorthms s that, on one hand, large vew separaton helps to stablze the computaton process. On the other hand, ponts appear qute dfferently when vews are too far away from one another, whch makes establshng the correspondences dffcult. A natural soluton s to use a dense set of mage sequences so that vew separaton and contnuty are both acheved. Wth uncalbrated cameras, a projectve reconstructon can be reached. wo types of approaches have been reported so far. One s gven by Mohr et al. [62] who adopted the dea of bundle adjustment from photogrammetry [98]. Projectve bundle adjustment s formulated as non-lnear mnmzaton, just as the tradtonal bundle adjustment. However, two ssues that ether dd not exst or were much less severe before become notceable now. Frst, because both cameras and ponts need to be estmated, the problem sze ncrease by a polynomal factor (O(m 2 n+mn 2 ) where m and n are the number of cameras and ponts respectvely). Second, and more problematc, our experments ndcate that the formulaton s ll posed whch appears as the degeneracy of the Jacoban matrx encountered n the mnmzaton process. hs problem s less severe n photogrammetry because, as ts name suggests, bundle adjustment s only used as an adjustment to already good ntal estmates, whlst Mohr et al. tredtoapplyttovery rough ntal guesses. he second approach (Sturm et al. [86]) uses matrx factorzaton wth the requrement that all ponts appear n all vews. hey frst form the measurement matrx 27

38 whose rows are mage coordnates of all ponts n one vew, and whose columns are mages of the same pont n all vews. hey then estmate the fundamental matrx for each successve par of vews. From that, they compute the projectve depth, and plug the latter nto the orgnal measurement matrx. By performng a rank-4-factorzaton on the resultng scaled measurement matrx, they obtan the projectve reconstructon. More on ths method wll be descrbed n secton he factorzaton method suggests a more general form of quad-lnearty for multvew geometry whch s characterzed by the rank-4-ness of the just ntroduced scaled measurement matrx. It s more general because, () t apples to an arbtrary number of vews and (), all prevous forms of lnearty (namely, eppolar geometry, trlnear tensor and quad-lnearty) can be derved from the resultng projecton matrces. Requrng all ponts be vsble n all vews s qute restrctve n practce. In trackng a vdeo sequence, t s natural to see some tracked features appearng and dsappearng. Dependng on good eppolar geometry estmaton s also undesrable snce the latter tself s stll an open ssue Self-calbraton For most applcatons, however, a projectve structure s not suffcent. herefore, algorthms that convert projectve structures to Eucldean ones ought to be sought. wo dfferent paths have been suggested. One of them (Faugeras [3], Pollefeys and Van Gool [7]) takes the stratfed approach. It frst transforms the projectve structure nto an 28

39 affne one by dentfyng the plane-at-nfnty, and then nto a Eucldean one by makng use of 3D knowledge about the scene such as the exstence of a trhedral vertex. he other path, referred to as self-calbraton, reaches to a Eucldean structure n one step. All exstng algorthms (Faugeras et al. [27], Hartley [4], Pollefeys et al. [72], and rggs [95]) are based on the nvarance property of the absolute conc (or quadrc) over rgd moton. he self-calbraton approach by Faugeras et. al. [27] uses the Kruppa constrants whch, agan, requres the fundamental matrx be computed apror. Hartley's approach [4] assumes a fxed rotatng camera, and computes the mage of the absolute conc. rggs [95] took a dfferent approach by computng the dual mage of the absolute conc. Both algorthms do not allow varyng focal length. Pollefeys et al. [72] presented a more general algorthm based on smlar deas but allows varable focal length. It should be ponted out that research on self-calbraton has just started. Exstng approaches are stll mmature, whch draws splt opnons about the subject. he supportng sde beleves ths s a great dea snce t totally bypasses the need for calbraton. he opposng sde argues that self-calbraton s nherently unstable because too many parameters need to be estmated, and that tradtonal calbraton should be used whenever possble. Our own experments suggest that whle self-calbraton s not a good canddate for calbraton purposes, t certanly provdes suffcent accuracy for reconstructon-orented applcatons. Such an applcaton wll be demonstrated n secton

40 3.4 Hybrd Methods he hybrd modelng approach s characterzed by () makng use of vson technques to recover the object geometry, and () renderng the recovered model va standard graphcs ppelne. Once a complete 3-D reconstructon has been assembled, the vsblty ssue (whch appears n mage warpng methods) becomes a non-ssue. Rather than address the general structure-from-moton problem as computer vson researchers do, Debevec et al. [22] cleverly restrct ther problem doman to archtecture, and use prmtves such as boxes, prsm and surfaces of revoluton as buldng blocks for archtectures. Correspondences of prmtves among dfferent vews are gven manually. By utlzng the geometrc constrants nherent n the prmtves, they are able to reduce the problem space sgnfcantly, changng the ll-posed problem nto a well-posed one. Source mages are "pasted" to the reconstructed models usng texture mappng a standard graphcs technque. he method presented by Setz et al. [8] [82] avods the mage correspondence problem by workng n a dscretzed scene space whose voxels are traversed n a fxed vsblty order. he boundary voxels (of an object) are found by checkng the color consstency of ther projectons n multple vews. he voxels are subsequently colored wth the average color. he success of hybrd methods reles on the use of vson technques to reconstruct a full 3-D model of the scene (as compared to warpng methods whch stay n the 2-D mage space), and at the same tme avod addressng the correspondence problem n ts most general form. he dsadvantage of the Debevec's method s that t only works n ts 3

41 desgnated doman. Setz's method s slow due to the use of volume-renderng technques to generate the fnal polygonal model. Both methods requre that precse camera calbraton data be known beforehand. 3.5 Summary Among all the approaches surveyed so far, only the Lght-Feld approach s nongeometrc, and therefore does not need the establshment of correspondence. he tradeoff s the requrement of extremely large storage space. It s not clear f the reported technques are feasble for wdely spread scenes where the 3-D sub-functon becomes nsuffcent. For the rest of the approaches, they all depend on, one way or the other, recoverng some knd of geometrc structure of the scene. Despte the orgnal attempt of bypassng explct structure recovery (thus calbraton too), successful examples have only been reported for specal cases where the cameras are ether not movng or movng n a planar fashon. In general cases, due to the nablty to resolve occluson n 2-D mage space, one has to resort to pxel-based renderng methods, departng from the ongong trend, whch s already polygon-based. As a result, n order to derve general solutons that are wdely applcable and use only a small set of nput mages, one needs to develop new algorthms that outputs explct Eucldean structures from uncalbrated mages. 3

42 Chapter 4 IBRM from wo Vews hs chapter presents our work on the smplest case of IBRM, that s, wth only two mages. he bnocular stereo technque s used the two pctures are shot from close-by locatons. Brefly, stereo works as follows. For a gven pxel n, say, the left mage, one searches for ts correspondence n the rght mage along the correspondng eppolar lne based on some local smlarty measure. hen, by ntersectng the two rays emttng from the two pxels, one recovers the 3-D locaton of the pont. All recovered 3-D ponts are connected to form a polygonal mesh representng the shape of the object. he nput mages can be texture-mapped onto the mesh, generatng new photo-realstc mages at arbtrary vewponts. A dvde-and-conquer approach s taken,.e. the overall problem s dvded nto several sub-problems: mage rectfcaton, matchng, projectve reconstructon and Eucldean reconstructon, whch are tackled separately. he stereo problem s often treated as a matchng problem only, wth the assumpton that the cameras are calbrated. Under uncalbrated scenaros, however, the frst and the thrd problems have to be addressed. A face reconstructon system s then developed usng off-of-the-shelf dgtal cameras. One mportant applcaton of such a system s telepresence, n whch several geographcally separated partcpants are brought nto one vrtual envronment. he dea s to capture dstlled nformaton of an ndvdual such as the head poston, orentaton 32

43 and, possbly, facal expresson, then use ths nformaton to drve the face model of an avatar at the remote ste. 4. Revew of Related Work As mentoned earler, the stereo problem s tradtonally treated as a matchng problem, that s, fndng all pars of pxels that are mages of common 3-D ponts. he reason s that when the cameras are fully calbrated ther ntrnsc and extrnsc parameters are known rectfcaton and reconstructon become trval ssues. In ths secton, works about stereo matchng are revewed. For a comprehensve survey, the readers are referred to Dhond et al. [24]. 4.. Aeral trangulaton Shape recovery from aeral mages was frst ntroduced n photogrammetry [98] under the name aeral trangulaton. Herethewordtrangulaton came from the fact that any ground pont and ts two projectons onto the two cameras form a trangle. he cameras must be calbrated. o produce accurate results, the plane must fly along a predetermned path. Pass ponts on the ground are marked and surveyed, and are used to connect multple areas n a lnear fashon, hence the name strpe adjustment. hssdepctedn Fgure 4.. Sometmes, multple strpes are further ted together va te ponts. hs operaton s called bundle adjustment as shown n Fgure 4.2. In general, aeral trangulaton s a labor-ntensve work. Specal machnes are bult for human operators to match the aeral mages manually. All passng ponts and te 33

44 ponts must be surveyed by sendng a team to the felds. Many fly paths are necessary to decde the fnal fly path. Fgure 4. Aeral rangulaton and Strpe Adjustment Fgure 4.2 Bundle Adjustment n photogrammetry 34

45 4..2 Bnocular stereo he same problem was studed n computer vson under the name bnocular stereo wth the goal of automatc matchng. he central ssue here s how to use mage nformaton, whch s qute lmted, rather than hgh-level knowledge to match the correspondng pxels n the two mages. hs s where the eppolar constrant comes n to play. It reduces a two-dmensonal search nto a one-dmensonal search. Cox et al. [2] and Ohta et al. [65] are representatves of ths type of methods. hese methods n addton ncorporated the orderng constrant, usng dynamc programmng. he dsadvantage s that nter-scanlne coherence s gnored, leadng to artfacts n the output. Includng global constrants n the search process naturally makes stereo matchng a constraned functonal optmzaton problem. he soluton functon d(u,v) defnes a regonally smooth surface over one of the mage plane (e.g. the left one). he surface s normally referred to as the dsparty surface. herefore what s actually acheved by the optmzaton scheme s to locate, n an abstract 3-D space, the dsparty surface. he approach by We et al. [97] s the latest example along ths lne of work. here, the dsparty surface s parameterzed as a functon wth Gaussan bass to reduce the number of parameters a typcal technque used n varatonal methods. he result, not surprsngly, looks overly smoothed. he work by Hoff et al. [44] explctly removes the outlers usng Hough transform before fttng the surface. Snce the Hough transform only apples when the algebrac form of the surface s known, they dvde the surface nto multple planar patches. It s 35

46 exactly ths dvson that lmts the robustness of Hoff's method because global constrants are broken at the boundary between patches. A more appealng soluton s provded by Lee and Medon [52] where each potental match s represented by a tensor (or ellpsod) that captures locaton, normal and uncertanty. Durng tensor votng, consstent matches enhance one another, ncreasng the confdence on the normal estmaton, whle nconsstent matches nterfere wth one another, ncreasng the uncertanty. he votng process results n a volumetrc feld nsde whch the dsparty surface s the most salent surface. he surface s extracted n a subsequent step usng volume-renderng technques (ang and Medon [9]). he problem of Lee's work les n the scale ssue (refer to her dssertaton [5]). hat s, the range of each vote s unformly preset. Another dsadvantage s that the votng process s tme-consumng. he reason comes from the fact that the convoluton kernel s 3- dmensonal and that t has to be algned wth the normal at each data ste by applyng a 4 4 transformaton. Recently, Ishkawa et al. [45] and Roy et al. [77] have proposed a global optmzaton formulaton for the general mult-baselne stereo problem. he formulaton s qute powerful because t ncorporates both nter- and ntra-scanlne constrants, and s able to model occlusons and dscontnutes. he optmzaton problem s solved by transformng t nto a maxmum-flow problem. hen, the mnmum-cut assocated wth the maxmum-flow yelds the dsparty surface. Both works, however, faled to demonstrate examples nvolvng more than two cameras probably because dong that 36

47 requres calbrated dataset. he theoretcal bound and actual executon tme reported by the authors suggest that ther methods are computatonally demandng. 4.2 Eppolar Geometry n Detal Here, we ntroduce the eppolar geometry whch s the basc mathematcal tool for our stereo system. More conceptual detals can be found n Faugeras [28]. he artcles by Zhang ([] and []) dealt wth computatonal ssues. Zhang also made hs software avalable for publc access. P P l l 2 p p 2 o 2 o O 2 O Fgure 4.3 Eppolar geometry Graphcally, eppolar geometry s depcted n Fgure 4.3, where P, P are 3-D scene ponts; p, p 2 are P s mages; O, O 2 are camera projecton centers; and the lne O O 2 s called the baselne. Notce that the two trangles O O 2 P and O O 2 P are comng from a pencl-of-planes whch s projected to the pencl-of-lnes n the mage planes. he latter (e.g. l and l 2 )formeppolar lnes. he ntersecton of each pencl-of-lnes s called the eppole (o, o 2 ). An eppole plays several roles smultaneously. It s the ntersecton of all 37

48 the eppolar lnes. It s the ntersecton of the baselne wth the mage plane. It s also the projecton of a camera projecton center on the counterpart mage plane. Algebracally, eppolar geometry s descrbed by the followng equaton: p Fp 2 = where F s a 3x3 rank 2 matrx called the fundamental matrx, p and p 2 are the 3-tuple homogeneous coordnates of correspondng pxel ponts. he equaton reveals the fact that p s located on the eppolar lne defned by p F. he relatonshp s symmetrc: p 2 2 s on the lne defned by p F.Snce o F for any p 2, F o = p 2 =.huso s the null vector of F whch reflects the fact that F s of rank 2. Smlarly, o 2 s the null vector of F. It s observable from Fgure 4.3 that f an mage plane s parallel to the baselne, then ts eppole s at nfnty. In that case, all eppolar lnes on that mage plane are parallel. 4.3 Rectfcaton In general, the two cameras used to take the stereo mages are not parallel. he task of rectfcaton s to acheve ths effect numercally so that they become coplanar and parallel to the baselne. he result s that scanlnes n each mage plane are parallel to one another, and the correspondng ones from both mages are horzontally concdent. radtonally, ths step requres fully calbrated cameras. Hartley et al. [39] gave an algorthm by fndng a specal factorzaton of the fundamental matrx. Here, we present a more ntutve algorthm that s also based on the fundamental matrx. It works well for two close-by mages, whch s the case of stereo. 38

49 In Fgure 4.4, l, r andl2, r2 are two pars of correspondng eppolar lnes. v sthe average of the y coordnates of the four end ponts (two from l, the other two from r), and v2 sthatofl2 andr2. he rectfcaton transformaton, one for each mage, maps the trapeze (black) to the rectangle (gray). he process s demonstrated n Fgure 4.5. he software provded by Zhang [] s used for the fundamental matrx computaton. Below, we brefly explan why ths algorthm works. Frst, recall from secton 2.2. that cross-rato between the two pencl-of-lnes n the two mage planes s unchanged, also from 2..5 that cross-rato s nvarant to homography. Second, wthn each pencl, the lne that s at nfnty s the one that passes the mage orgn because t possesses the canoncal form [a, b, ]. hrd, after the rectfcaton, all three bases are algned two of them are the top and bottom edges of the trapeze, the specal one mentoned above s mapped to the X-axs. hs fact plus the nvarant property of cross-rato makes the algnment of the correspondng eppolar lnes (or scanlnes after rectfcaton) necessary. l v r l2 v2 r2 Fgure 4.4 Illustraton of the rectfcaton algorthm 39

50 (a) orgnal nput supermposed wth matched ponts (b) orgnal nput supermposed wth eppolar lnes (c) rectfcaton results Fgure 4.5 he orgnal and the rectfed stereo par 4

51 4.4 Image Matchng In ths secton, t s assumed that the mages have been algned. We can thus encode the correspondence nformaton by a functon d(u,v) defned over the left mage plane (denoted as I ) such that (u,v) and (u+d(u,v), v) are a par of correspondng pxels. Geometrcally, d(u,v) defnes the prevously mentoned dsparty surface. Assumng correspondng pxels have smlar ntenstes (color), and lettng Φ denote a smlarty functon such that larger values mean more smlar pxels, matchng can be formulated as a varatonal problem: D ( u, v) = max Φ ( u, v, d( u, v)) dudv. (4.) d ( u, v) I One smple soluton to (4.) s to sample over all possble values of u, v, andd, followed by an exhaustve search n the resultng volume. here are two ssues: one s effcency how to perform the search n a tme-effcent way; the other s robustness how to avod local extrema. he method presented below addresses both ssues. he core dea s to treat d(u,v) geometrcally as a surface n a volume nstead of an algebrac functon, and to extract the surface by propagatng from seed voxels whch have hgh probablty of beng correct matches he u-v-d volume We use the normalzed cross-correlaton over a wndow as the smlarty functon: Φ( u, v, d) = Cov( W ( u, v), W Std( W ( u, v)) Std( W l l r ( u + d, v)) r ( u + d, v)) (4.2) 4

52 where W l and W r are the ntensty vectors of the left and rght wndows of sze ω centered at (u,v) and(u+d,v) respectvely, d s the dsparty, Cov stands for covarance and Std for standard devaton. he wdth and heght of the (left) mage together wth the range of d form the u-v-d volume. he range of Φ s [,]. When Φ scloseto,thetwopxelsare well correlated, hence have hgh probablty of beng a match. When Φ s close to zero, that probablty s low. In mplementaton, a threshold needs to be set. We dscuss how to choose ts value n the next subsecton Dsparty surface extracton he fact that Φ s a local maxmum when (u,v,d) s a correct match means that the dsparty surface s composed of voxels wth peak correlaton values. Matchng two mages s therefore equvalent to extractng the maxmal surface from the volume. Snce the u-v-d volume s very nosy, smply applyng the Marchng Cubes algorthm [55] would easly fall nto the trap of local maxma. We thus mplemented a propagaton algorthm [46]. In addton, we make use of the dsparty gradent lmt [] whch states that d / u <. Use of ths constrant n the scanlne drecton s equvalent to the orderng constrant often used n scanlne-based algorthms (e.g. [2] by Cox et al.). Usng t n the drecton perpendcular to the scan lnes enforces smoothness across scan lnes, whch s only partally enforced n nter-scanlne based algorthms such as the one presented by Ohta and Kanade [65]. 42

53 Algorthm descrpton he output from our matchng algorthm s the dsparty map whch corresponds to the voxels that comprse the dsparty surface. hs s where t dfferentates tself from volume renderng, or other matchng methods that model the dsparty surface as a contnuous functon. he algorthm undergoes two steps. Step. Seed voxel selecton Avoxel(u,v,d) s a seed voxel f, ) t s unque meanng for the pxel (u,v), there s only one local maxmum at d along the scanlne v, and ) Φ(u,v,d) s greater than a threshold t. A seed necessarly resdes on the dsparty surface. Otherwse, the true surface pont (u,v,d ) forwhchd d would be a second local maxmum. o fnd seeds, the mage s dvded nto a number of buckets. Insde each bucket, pxels are checked randomly untl ether one seed s found, or all pxels have been searched wthout success. Durng the search, the voxel values are cached to save computaton tme for the next step. he value of t determnes the confdence of the seed ponts and s set close to. In our experments, we start from.995 tryng to fnd at least seeds. If too few seeds are found, the value s decreased. In all the examples tred so far, we have found the range of t to be between.993 and.996. Step 2. Surface tracng Smultaneously from all seed voxels, the dsparty surface s traced by followng the local maxmal voxels whose correlaton values are greater than a second threshold t2. he 43

54 d / u < constrant determnes that when movng to a neghborng voxel, only those at d, d-, d+ need to be checked. Intally, we store the seed voxels n a FIFO queue. After tracng starts, we pop the head of the queue every tme, and check the 4-neghbors of the correspondng pxel (border pxels need specal treatment). When two surface fronts meet, the one wth the greater correlaton value prevals. If any new voxels are generated, they are pushed to the end of the queue. hs process contnues untl the queue becomes empty. o enforce smoothness, we assgn (u', v', d) hgherprortythan(u', v', d-) and (u', v', d+). o obtan sub-pxel accuracy, a quadratc functon s ftted at (u', v', d'-), (u', v', d'), and (u', v', d'+) where (u', v', d') s the newly generated voxel. t2 determnes the probablty that the current voxel s on the same surface that s beng traced. It turns out that the value of t2 s not crtcal. In all the examples tred so far, the value.6 s used. Pseudo code of the tracng algorthm s gven below Complexty he worst case complexty of the seed selecton part s bounded by O(WHDω) wherew and H are respectvely the wdth and heght of the mage, D s the range of the dsparty, and ω s the sze of the correlaton wndow. hat of the tracng part s bounded by O(WHω). Snce some voxels have already been computed durng the frst step, ths lmt s never reached. Actual runnng tme on examples s gven later. Note that, n ths case, t s expected to traverse the each mage plane at least once. hus the lower bound of the complexty s O(WH). 44

55 Algorthm. Dsparty Surface racng Intalze Q wth the seed voxels; Whle (not empty Q) { Set (u, v, d) =popq; For each 4-neghbor of (u, v) { Call t (u', v'); Choose among (u', v', d-), (u', v', d), (u', v', d +) the one wth the max correlaton value and call t (u', v', d'); f (u',v') already has a dsparty d" dsparty(u', v') = Φ( u', v', d') > Φ (u', v', d")? d':d"; else f Φ(u', v', d') > t2 { dsparty(u', v') = d'; push (u', v', d') to the end ofq; } } } Fgure 4.6 Pseudo code of the tracng algorthm Extracton on the Hervé s Face stereo par Fgure 4.7 (a) shows two perpendcular slces of the u-v-d volume computed from the Hervé s Face stereo par. Lghter pxels ndcate hgher Φ (correlaton) values. Clearly, the volume s qute nosy. hs explans why drectly solvng the varatonal problem (6) s dffcult how can a reasonable ntal soluton be found n the frst place? he same slces after non-maxma suppresson are shown n (b). We see that local maxma exst n many places. herefore smply usng local smlarty measurement for correspondence search could be dsastrous. (c) demonstrates what the volume look lke after tracng. Now, the dsparty surface appears clearly. In ths example, W=52, H=384, ω=9, and the 45

56 dsparty range s [-59, 3]. Runnng on a SGI O2 wth default values for t and t2, step takes 2696 seconds, whch results n 468 seeds; step 2 takes 24 seconds. (a) two cuts through the orgnal u-v-d volume (b) after non-maxma suppresson (c) after tracng Fgure 4.7 Results on Hervé s Face stereo par 46

Fgure 4.8 Output dsparty map of the Hervé s Face par 4.4.4 Improvng tme effcency Obvously, the bottleneck of the prevous extracton algorthm les n the seed selecton part.

57 Fgure 4.8 Output dsparty map of the Hervé s Face par Improvng tme effcency Obvously, the bottleneck of the prevous extracton algorthm les n the seed selecton part. o mprove tme effcency, we modfy the algorthm so that t proceeds n a multresoluton fashon: only at the coarsest level s the full volume computed; at all subsequent levels, seeds are nherted from the prevous level. o guarantee the exstence of seeds at the coarsest level, we replace the unqueness condton (n step of ) by a wnner-take-all strategy, that s, at each (u,v), we compute all voxels (u,v,d) where d [-W /2, W /2] and choose the one that has the maxmum correlaton value. Under ths relaxed condton, some seeds may represent ncorrect matches. o deal wth ths, we assgn the seeds randomly to fve dfferent layers. As a result, fve dsparty maps are generated at the end of tracng. hs lets us dentfy and remove wrong matches. If no 47

58 agreement can be reached, we leave that pont unmatched. At each level, extracton s performed for both the left and rght mages. Crosscheckng s then conducted. hose pxels whose left and rght dspartes dffer by more than one pxel are elmnated and recorded as unmatched. At the fnest level, small holes are flled startng from the borders. Fgure 4.8 shows the fnal dsparty map resultng from the mproved algorthm. he executon tme s reduced to 429 seconds, about /6 of the prevous verson. Assume the reducton rate s 4 and the sze of the correlaton wndow s constant over all resolutons, the tme complexty s reduced to O(WHω). Another mert of the multresoluton verson s that there s no need to prescrbe a value for D. 4.5 Reconstructon In the reconstructon stage, the correspondence nformaton s transformed nto 3-D Eucldean coordnates of the matched ponts. A two-step approach (projectve reconstructon followed by Eucldean reconstructon) s taken whch elmnates the necessty of camera calbraton. Both steps are extended n Chapter 5 and Chapter 7 respectvely to handle multple vews Projectve reconstructon by matrx factorzaton In [7] [9], Kanade et al. descrbed a reconstructon algorthm usng matrx factorzaton. Denote the projectons of n ponts n two vews as [ u, ] where =,2 and j=,, n. he followng measurement matrx s defned: j v j 48

59 49 = n n n n v u v u v u v u W L L. he authors observed that, under orthographc or para-perspectve projecton, the aforementoned matrx s of rank 3. hen, a rank-3-factorzton of the measurement matrx gves the affne reconstructon. One advantage of ther algorthm s that all ponts are used concurrently and unformly. In applyng the dea to perspectve projecton models, Chen and Medon [4] show that the followng modfed measurement matrx s of rank 4: = n n n n v u v u v u v u W L L where each column denotes a par of correspondng pxels after rectfcaton. hus a rank- 4-factorzaton produces a projectve reconstructon (secton 2.2.2): [ ] n Q n Q P P Q P W L = =, (4.3) where P and P 2 are the 3 4 matrces of the two cameras, and Q 's are the homogeneous coordnates of the ponts. Appendx A. gves a method of computng such a factorzaton usng Sngular Value Decomposton (SVD). Next we transform the so far obtaned projectve reconstructon nto the frst canoncal form (secton 2.2.3) whch s a prerequste of our Eucldean reconstructon algorthm.

60 Let [ P ] P =. It s known from [28] that C = P p p s the frst projecton center. he stereo rg can be translated so that C s concdent wth the world orgn. Let the translaton matrx be I B = P p, then I P [ ] p P B = P p = [ P ]. hus P P W = P BB L n Q n P 2 2 p2 [ Q Q ] = [ Q L ] (4.4) s the desred canoncal form for stereo projectve reconstructon Eucldean reconstructon Now that the world coordnate system (the orgn and the axes) s concdent wth that of the frst camera, from secton 2.2.2, Eucldean reconstructon s equvalent to fndng the Projectve Dstorton Matrx H such that and [ P ] H [ A ]I =, (4.5a) R [ P p ] H = µ [ A ] (4.5b) where µ compensates for the relatve scalng between the two equatons. Snce H s defned up to a scale factor, we set one of ts element be : 5

61 5 = h h H H. hen, (4.5a) becomes, [ ] [ ] [ ] A P H P H P = = h h h whch mples h = and A P H =.hus = A P H h. (4.6) Plug (4.6) nto (4.5b), [ ] [ ] [ ] A R A A P P A P P = µ + = p h p h p whch generates = = = M M M R R f R f R A A P P µ µ h p (4.7a) and 2 µa 2 = p. (4.7b) Snce R s a rotaton matrx, (4.7a) further expands nto the followng 5 constrants on f, f 2,andh: = = = M M M M M M, (4.8a) M f M M = =, (4.8b)

62 Once f, f 2,andh are computed, H can be obtaned from (4.6). R, and µ are obtaned f from (4.7). o determne the ntal value for H, letr I, µ, and A A 2 = snce the two cameras have smlar orentaton and focal length. It follows that f =A, H = P A and p h = ( I P P ) A. 2 2 hus, an approxmate Eucldean reconstructon can be acheved solely dependng on f. We have developed an nteractve tool to let the user nput f, and adjust ts value untl the approxmaton looks reasonable. 4.6 Results 4.6. Pentagon par Fgure 4.9 (a) s the nput rectfed par (52 52). We use t to test our matchng algorthm. Fgure 4.9 (b) s the resultng dsparty map and (c) s a shaded vew of the reconstructon. Notce that some detals such as the concourse, the freeway, and the small "" shape roof on the lower-rght part of the buldng have all been captured. Edges are smoothed, however, snce the algorthm s area-based only. 52

63 (a) nput mage par (b) dsparty map (c) reconstructon Fgure 4.9 Results of the Pentagon par Notce the shape buldng nsde the crcle. 53

64 (a) nput (b) tracng result (c) all local maxma (d) two textured vews of the reconstructon Fgure 4. Results on the Renault Auto Part par 54

4.6.2 Renault Automoble Part Results on ths example are shown n Fgure 4.

For comparson, (c) shows all local maxma n the u-v-d volume whch s obvously qute nosy. It s clear that a brute-force method would produce dsastrous results.

However, snce the vergence angle s very small, the dstorton s also small, as evdenced n (d). 4.6.3 Herve s Head Fgure 4.

65 4.6.2 Renault Automoble Part Results on ths example are shown n Fgure 4. where (a) s the nput par, and (b) shows the dsparty surface obtaned from the tracng procedure. For comparson, (c) shows all local maxma n the u-v-d volume whch s obvously qute nosy. It s clear that a brute-force method would produce dsastrous results. Snce we only have the rectfed mages, but not the rectfcaton homography, only an approxmate Eucldean structure can be obtaned. However, snce the vergence angle s very small, the dstorton s also small, as evdenced n (d) Herve s Head Fgure 4. llustrates results on the "Hervé's Head" par. he par s taken by a sngle 6mm camera wth four mrrors to guarantee that there s no moton of the subject. It can be seen n the fgure that even the hghlght pxels on the nose have been correctly matched. It s also worth mentonng the shape of both upper and lower lps has been captured as well, as can be seen n the rght-most pcture. Fgure 4. Results on the Herve s Head par 55

66 4.6.4 A full head example A styrofoam head s used as the testng object. Sx stereo mages are obtaned by placng t on a rotary table whle keepng the cameras fxed. he rotaton angles are, 6, 9, 8, 27 and 3. he two extra vews at 6 and 3 help to ncrease accuracy near the frontal face. Each stereo par contrbutes a slce of the head (pont cluster). All slces are then fused n a common coordnate system by rotatng them to the known degree. he result s shown n Fgure 4.2 where (a) s one of the mages at the 6 angle, (b) s the dsparty map and (c) shows two vews of the fused data set. Spurous ponts exst n the reconstructon. Some of them are sparse and random, therefore relatvely easy to be removed. here are others that tend to form small surface patches. hese ponts manly come from the boundary areas of the stereo pars where the foreground melts nto the background, a known problem n stereo matchng. he nose removal ssue s not addressed n our work. Instead, the software wrtten by ang [9] s used. As seen n Fgure 4.3, not only are all the spurous ponts elmnated, the large empty area on the top of the head s also closed. he theoretcal foundaton of the software s called ensor-votng, proposed by Medon and Lee [52][9]. horough descrptons about the theory and the mplementaton algorthms supportng ang's software can be found n the book by Medon, Lee and ang [59]. 56

67 (a) one of the 6 par (b) the dsparty map of ths par (c) dfferent vews of the fused reconstructon Fgure 4.2 Reconstructed full head model 57

he cameras output uncompressed color mages n IFF format wth 64x48 resoluton. he color nformaton s used for texture mappng only. We decde to work on faces for two prmary reasons.

68 Fgure 4.3 Results from tensor-votng 4.7 A Face Reconstructon System o further explore the potental of our algorthm, we have bult a Face System usng two FUJI DS-3 dgtal cameras and a synchronzng devce (Fgure 4.4). he cameras output uncompressed color mages n IFF format wth 64x48 resoluton. he color nformaton s used for texture mappng only. We decde to work on faces for two prmary reasons. Frst, faces are hard to reconstruct due to ther smooth shapes and lack of promnent features n most areas. Second, there are plenty of applcatons that may make use of such a system, for nstance, teleconferencng and anmaton. We are collectng a growng gallery of face sets, some of them, together wth the acqured models, are demonstrated n Fgure 4.5 and Fgure 4.6. In each example, the 58

69 frst two mages are the nput par; the remanng ones are the reconstructed models. In general, we are pleased wth these results. Fgure 4.4 Our stereo system made from two FUJI DS-3 59

70 Alexandre Francos Prof. Max Nkas Stefan Hnz Fgure 4.5 Examples and results from the Face System 6

71 Prof. Gérard Medon Qan Chen Dr. Zhengyou Zhang Fgure 4.6 More examples and results 6

72 Chapter 5 Mult-vew Projectve Reconstructon hs chapter begns the effort of generalzng the reconstructon algorthms developed n the prevous chapter to the mult-vew case, that s, mult-vew projectve reconstructon. Later, n Chapter 7, the algorthm for Eucldean reconstructon wll be presented. As sad n the prevous two chapters, research on the topc of structure-from-moton started a long tme ago. he approach taken by the photogrammetrsts (see the monograph by Wolf [98]) was bundle-adjustment, whch requres aprorthe knowledge of both the extrnsc parameters (locaton and orentaton) of the cameras and the ntrnsc parameters (ncludng focal length). Later, t was shown that the requrement on extrnsc parameters could be relaxed (e.g. Ramesh Jan [48]), leavng an unresolved global scale whch n many cases s not needed anyway. Recently, Faugeras [26] proved that when nether the extrnsc nor the ntrnsc camera parameters are known, n other words, when the cameras are totally uncalbrated, then, under the perspectve camera model, a projectve reconstructon can be reached whch dffers from the Eucldean one by a 4 4 transformaton. In the meantme, Kanade et al. [7] [9] were expermentng wth affne camera models. hey developed a method usng matrx factorzaton that computes extrnsc and ntrnsc parameters smultaneously under the orthographc or paraperspectve model (see 4.5. for a bref ntroducton). Both bundle-adjustment and matrx factorzaton methods have been extended to the perspectve model, wth lmted success. Detaled analyss of these algorthms wll be 62

73 63 gven n sectons that follow. In concluson, a formulaton for general mult-vew structure-from-moton wth uncalbrated cameras s stll an ongong research ssue. 5. Factorzaton-based Methods 5.. Factorzaton-based reconstructon for affne cameras hs problem was tackled by omas and Kanade [9], and Poelman and Kanade [7]. Here, we adopt the dervaton of Poelman et al. wth slght modfcatons to be consstent wth our presentaton for perspectve cameras n later sectons. A pont whose 3-D locaton s X j (j=,, n) s observed n an affne camera whose projecton matrx s P (=,, m) at mage coordnates (u j, v j )suchthat j j j y x P X v u ], [ ], [ + = where (x, y ) s the mage of the world orgn. Wrtng n a matrx form, we have [ ] [ ] L M L M L M M M L + = = m m n m mn mn m m n n y x y x X X P P v u v u v u v u W (5.) where W s called the measurement matrx. Wthout loss of generalty, t s assumed that the world center s the center of mass of the ponts, thus = n X j j. Addng all columns n both sdes of (5.) and dvde each sum by n, we have n u x j j = and n v y j j = for all =,, m. herefore, we can compute x and y mmedately from the

74 mage data. Once they are known, we subtract them from W, gvng the regstered measurement matrx u v n W = = M M M M 3 u v m m L L u v n u v mn mn P P m 2m 3 [ X L X n ] n (5.2) where u j = uj x and vj = vj y. hs can be summarzed nto the followng rank theorem for affne cameras: heorem. Under affne projecton, the rank of the regstered measurement matrx s at most 3. Actually, except for the degenerate cases where all ponts are ether colnear or coplanar, W s of rank 3. hus, n general cases, a Rank-3-Factorzaton of W gves the cameras projecton matrces and the ponts coordnates n one step. Such a factorzaton can be mplemented usng Sngular Value Decomposton (see Appendx A.). Snce t s not our focus to study the degenerate cases, n the followng, we wll assume that all the ponts are nether colnear nor coplanar n space Extenson to perspectve cameras o extend the prevous results to perspectve cameras, one naturally defnes the measurement matrx as: 64

75 65 = mn mn m m n n v u v u v u v u W L M M M L, (5.3) and then asks the queston: s W of rank 4? It turns out that ths s true only under a specal stuaton. But we delay the dscusson tll secton 5.2. For now, let us frst look at the generc case. Suppose we have a perspectve camera whose projecton matrx s P (3x4), and a pont n some projectve space whose homogeneous coordnates are n X j (4x). Usng the symbol ~ to denote equalty up to a scale (see secton 2..), we have [u j, v j,] ~ P X j,orλ j [u j, v j,] = P X j, for =,, m and j =,,n, where λ j s a scale factor whch, under normal crcumstances, depends on both the camera and the pont. he equvalent matrx form s: [ ] n n m m mn mn mn m m m n n n s X X P P v u v u v u v u W = = L M L M M M L λ λ λ λ (5.4) where W s s called the scaled measurement matrx. A smlar concluson thus holds for perspectve cameras whch s expressed as:

76 heorem 2. here exsts a set of scale factors, all non-zero, such that the correspondng scaled measurement matrx s of rank 4 under non-degenerate cases. Consequently, for ponts n general postons, a Rank-4-Factorzaton of the scaled measurement matrx produces a projectve reconstructon of the ponts. he theorem also suggests that the key step n the perspectve case s to recover the approprate scale factors. Sturm et al. [86] proposed a progressve method by computng the fundamental matrx between each successve par of vews. Startng from the frst vew wth λ = λ 2 = =λ n =, the method fnds λ, λ 2,, λ n for =2,, m progressvely usng the fundamental matrx F,- between the (-) th and the th vews: λ, j ( o, q, j ) ( F, q, j ) = λ o q,, j, j where o,- s the eppole, and q j =[u j, v j,]. he algorthm thus requres the eppolar geometry be known or estmated at the begnnng he Iteratve Factorzaton Algorthm (IFA) Sturm's method requres the fundamental matrx, whch causes two problems n practce. Frst, there exst specal camera confguratons where the fundamental matrx becomes degenerate, e.g. all cameras are parallel and co-planar. Second, estmaton of the fundamental matrx s stll expermental, normally cast as a constraned non-lnear 66

77 mnmzaton problem whch needs ntalzaton and may ncur numercal nstablty. he Iteratve Factorzaton Algorthm that we are proposng avods all these problems. Let σ, σ 2, σ 3, σ 4, σ 5 be the fve largest sngular values of W s, and the rank-4-ness of W s be measured by σ 5. he soluton to (5.4) can be obtaned by: mn ( σ 5). (5.5) λ ( m, j n) j It s qute dffcult to fnd an analytcal soluton to (5.6). Instead, a heurstc algorthm s presented here whch, verfed expermentally, does mnmzes σ 5.hemajortwosteps of the algorthm are () to perform the rank-4-factorzaton on W (Sngular Value Decomposton Appendx A.), obtanng W 4 ; and () to rescale W wth the coeffcents of W 4 to make t closer to rank-4. We repeat ths factorzaon rescalng process, tryng to brng W closer and closer to rank 4 untl σ 5 reaches a predefned small value. hs process s depcted n Fgure 5.. For reference, we call t the Iteratve Factorzaton Algorthm (IFA). he pseudo code of the algorthm s summarzed n Fgure 5.2. Factorzaton W W 4 W s Updatng scale factors Fgure 5. Illustraton of IFA 67

78 Algorthm 2. Iteratve Factorzaton Algorthm (IFA) Repeat { Perform SVD on W: W=UΣV ; Obtan Σ 4 from Σ by settng all sngular values except the 4 largest ones to zero; Let W 4 =UΣ 4 V ; Let the 3-tupple correspondng to the -j th element (as defned n (4.3)) of W and W 4 be a j and b j respectvely; Compute the -j th scale factor for W as: s = ( a b ) ( b b ) Update W wth the new scale factors. }Untl(σ 5 < ε) j j j j j Fgure 5.2 Pseudo code of IFA It s nterestng to see the dfference between IFA and a smlar one proposed by Berthlsson et al. [9]. Let W=UΣV be the sngular value decomposton of W. Denotethe frst four columns of U an V respectvely by U 4 and V 4,thenP= U 4 dag(σ,σ 2,σ 3,σ 4 ) gves the camera matrces and X=V 4 gves an approxmate projectve reconstructon. Notce that here X conssts of four orthonormal columns. he updatng formula of IFA s: λ u v u v = P X n ( k + ) ( k + ) L λn n = P XX u u n ( k ) ( k ) = λ v n v L λ n ( X X ) (5.6) where k s the teraton ndex. On the other hand, accordng to [9], Berthlsson's updatng formula s: X 68

79 69 ) ( ) ( ) ( = + + X X I v u v u n n k n k λ λ L (5.7) whch equals to ) ( ) ( ) ( ) ( ) ( X X v u v u v u v u n n k n k n n k n k = λ λ λ λ L L. (5.8) In some sense, (5.8) concatenates two steps of (5.6). herefore Berthlsson's algorthm converges faster. However, whle the tme complexty of solvng for the scale factors from (5.6) s O(n) because each of them s computed ndvdually, that of solvng (5.7) s O(n 3 ) because all factors are solved smultaneously (whch amounts to solvng a homogeneous 3nxn equaton). hs analyss makes IFA practcally more applcable. 5.2 Rank heorem Under he Common-Image-Plane Condton In the prevous secton, algorthms that compute the scale factors usng matrx factorzaton are dscussed. An unanswered queston s f there exst cases where the computaton of the scale factors s unnecessary. Asked dfferently: are there cases where the measurement matrx s of rank 4. o answer that queston, some preparatons are necessary. Frst s the followng lemma. Lemma. Let W s be a scaled measurement matrx wth the scale factors λ j as defned n heorem 2, for =,, m and j=,, n. hen the correspondng measurement matrx W

80 7 s rank 4 f and only f there are m+n ndependent coeffcents p, m, and x j, j n, where p s related to the th camera and x j s related to the j th pont, such that λ j =p x j. Proof. Suffcent condton. If we pre-multply W s by the dagonal matrx, 2 p m p p L and post-multply t by, 2 x n x x L we have = = n n m m n s m x X x X p P p P x x x W p p p W L M L L 2 2, whch ndcates that W s of rank 4.

81 7 Necessary Condton. Let the Rank-4-Factorzaton of W be [ ]. n m X X P P W = L M Snce j X s and j X s are two projectve reconstructon of the same set of ponts, they are equvalent up to a 4x4 non-sngular collneaton H (secton 2.2.2). hs means that there exst n non-zero factors: x,,x n,suchthat ] [ ] [ n n n X X H X x X x L L =. herefore [ ]. = = = n n m n n m n m x X x X P P x X x X H P P X X HH P P W L M L M L M Consder the two projecton matrces of the th camera. he orgnal defnton says that λ j [u j, v j,] = P X j. Hence P and P both project X,..., X n to (u,v ),..., (u n, v n ). As a result, they should only dffer by a scale factor. Denote the factor p,andwrtep = p P. Knowngthsstrueforall m, wehave λ j [u j, v j,] = p P X j = p x j [u j, v j,] for m, j n,

82 72 whch mples λ j = p x j,for m, j n. It follows from the dervaton that p s related to the th camera, and x j s related to the j th pont. As the second preparaton, we need some results on the geometrcal nterpretaton of the rgd transformaton (represented by a rotaton matrx R and a translaton vector ) that relates the camera coordnate system (X c ) and the world coordnate system (X w ): = w X R w X w c or = w X R R w X c w. Wrtng R n ts row vectors = R R R R (3) (2) (), we have = = R (3) R R R. Snce[] represents the camera's prncpal axs n X c, R (3) the thrd row of R represents the same axs n X w.furthermore,snce = R,

83 represents the world orgn n the camera coordnate system. In partcular, the thrd component of s the translaton n the drecton of the prncpal axs (the Z axs of X c ). As the fnal preparaton, we ntroduce the common-mage-plane condton. InFgure 5.3, O ndcates the orgn of a camera coordnate system, O ndcates world orgn, and R (3) represents a camera's prncpal axs. It can be observed that the common-mage-plane condton s equvalent to ). All the camera prncpal axes are parallel, and ). he common mage plane s perpendcular to the common prncpal axs. Volatng one of them wll volate the common-mage-plane condton. Y Z X w O t X z =(V) R (3) R 2(3) R 3(3) O O 3 O 2 X c x Fgure 5.3 he common-mage-plane condton X c -Camera Coordnate System, X w -World Coordnate System R (3), R 2(3), R 3(3) - the parallel prncpal axes, O, O 2, O 3 - the coplanar projecton centers. 73

84 74 Now, we are ready to answer the queston: under what stuaton s W tself of rank 4? heorem 3. he measurement matrx s of rank 4 f and only f the cameras satsfy the common-mage-plane condton. Proof. Let A represent the nternal camera parameters, R and be the external parameters. he projecton process s [ ] = ~ j j j j X R A X P v u = / (3) (2) () (3) (2) () j X t t t R R R f v u av k au + = (3) (3) j j j v u f t X R (5.9) From the second preparaton, we know that R (3) s actually the camera's prncpal axs n the world coordnate system, and t (3) s the translaton of the world orgn n ths drecton. If the scale factor λ j s chosen to be j f t X R ) ( ) (3 3) ( +, then (5.9) s an ndcaton that the resultng scaled measurement matrx s of rank 4. From the lemma, t can be nferred that the suffcent and necessary condton for the correspondng

85 measurement matrx to be rank 4 s that λ j = ( ( 3) X j + t(3) R ) f can be decomposed nto two terms, such that one of them s related to the th camera, and the other s related to the j th pont. hs s true f and only f () R (3) =R 2(3) = =R m(3) =V, and () t (3) =t 2(3) = =t m(3) =t, so that λ j =(/f )(V X j +t)=(/f )z j. he frst condton mples all prncpal axes are parallel (wth the common drecton V).he second one mples that the world orgn has the same dstance t to all cameras along V. It s not hard to verfy that these two condtons n fact amount to the common-mage-plane condton. Notce that n a general camera confguraton, the dot product appearng n λ j prevents t from beng decomposed nto two terms related respectvely to a camera and a pont. Consequently, there exsts no Rank-4-Factorzaton for the measurement matrx n the general case. On the other hand, the above specal confguraton s not merely of theoretcal nterest. Any two, or three vews can be rectfed nto such a confguraton. We have seen the two vew case n secton 4.3 as part of our stereo reconstructon algorthm, whch we call rectfcaton. o confrm the theory, an experment was conducted on the CIL- dataset provded by the Calbrated Imagng Laboratory of Carnege Mellon Unversty where one camerawasmountedonarobotarm.wetooksevencoplanarvews(number,2,3,4,5,, ) from the set. he frst fve vews are on the same baselne, that s, the cameras are colnear. he last two are on another baselne, whch s above the frst one. Camera calbraton data s avalable. he average reconstructon error (n mm) vs. thenumberof vews s plotted n Fgure 5.4, where the sold curve corresponds to the vew sequence of 75

86 ,...5,,, the dashed curve corresponds to,,,...,5. It s observed that the error decreases as the number of vews ncreases, whch ndcates that some random errors cancel themselves. We also computed the camera postons. Snce there s no ground truth for them, we only lst n able 2 the relatve camera movement wth respect to the frst camera, knowng that each of them moves ether horzontally or vertcally n a 2mm nterval. Fgure 5.4 Average reconstructon error of the CIL- dataset 76

87 able 2. Recovered relatve camera movement (mm) X Y Z camera camera camera camera camera camera camera A Geometrc Explanaton In [92], rggs ntroduced what he called the Jont Image heory whch gves us an nsght to the projectve reconstructon problem n a geometrcal sense. Let us look at equaton (5.4) and denote the 4 dmensonal space spanned by X j 's as Σ. he equaton can be nterpreted n two ways: n 4-D ponts n Σ are projected to m algebracally 3-D spaces, or the same ponts are projected to an abstract 3m-D jont mage space whch we denote Ω. Accordngly, the m 3x4 projecton matrces form one 3mx4 jont projecton matrx ϕ: Σ Ω. herangeofϕ only occupes a 4 dmensonal sub-space Ω 4 n Ω. Infact,amore precse defnton for ϕ should be ϕ: Σ Ω 4. From ths pont of vew, W s s merely a sample of Ω 4. Projectve reconstructon s nothng but choosng a bass for Ω 4 (whch defnes ϕ) and projectng W s onto ths new base whch generates the homogeneous coordnates for the ponts. hs corresponds to a standard engneerng problem called Prncpal Component Analyss (PCA) and s normally solved usng Sngular Value Decomposton whch s exactly what has been proposed n [9]. he problem, however, 77

88 s that the locaton of Ω 4 n Ω s unknown. What s known s the sub-space spanned by W whchwedenotebyω x.itsthusnecessarytowarpw from Ω x nto Ω 4, whch can be accomplshed by fndng the approprate scale factors. he followng dualty between the spaces just mentoned and the matrces appear n our earler dervaton are easly observable: Ω 4 W s Ω x W ϕ P he dmenson of Ω 4 s a reflecton of the rank-4-ness of W s. Arbtrary camera placement generates a jont-camera whose projecton s a map of ϕ : Σ Ω x. he dmenson of the range space of ϕ s unknown. Restrctng the cameras to have a common mage plane results n a map of Σ Ω 4. In ths sense, the rectfcaton step n bnocular stereo n effect accomplshes the same goal as warpng the jont mage. 5.4 Iteratve Methods 5.4. Mohr's non-lnear mnmzaton method Adaptng the bundle adjustment method to the fully uncalbrated case, Mohr et al. [62] proposed to formulate projectve reconstructon as a mnmzaton problem: f = u 2 P () X j P (2) X j j j j P (3) X j P (3) X j + v 2 (5.) 78

89 where P ( ) P (2), P (3), are the row vectors of P, wth the constrants P = and X j =. he formula can deduced from (5.4) by notcng that u j = P () X j P (3) X j, and v j P (2) X j P (3) = X. o determne the ntal values, fve non-coplanar ponts are j assgned standard coordnates [,,,], [,,,], [,,,], [,,,], [,,,]. It s clamed that the ntal values for the remanng data are not crtcal. For example, projecton matrces are all set to the dentty matrx and ponts are set to [.5,.5,.5,.5]. Mohr's algorthm s not computatonally effcent due to the need to nvert the assocated Jacoban matrx whose sze s polynomally proportonal to the number of cameras and ponts ( m 2 n+mn 2 where m and n are the number of cameras and ponts respectvely). More severely, our mplementaton showed that the Jacoban matrx was constantly rank defcent, whch ndcates that some nherent constrants were not taken nto account. Consder the followng matrx formed by accumulatng column-wse the homogeneous coordnates of l ponts: x y = A z w x y z w L L L L xl y l. z l wl If all these ponts are collnear, only two of the columns are ndependent, thus matrx A s of rank 2. In other words, the determnant of any order-3 or -4 mnor of A s zero. If the ponts are coplanar, the determnant of any order-4 mnor s zero. Incorporatng these extra condtons nto (5.) s not always desrable. For example, although t s easy to check the collnearty condton, to check for coplanarty, one has to compute the 79

90 fundamental matrx. Furthermore, the above constrants amount to degree 3 or 4 polynomals, makng the non-lnearty of the problem even worse Our approach One common drawback among the factorzaton-based approaches s that all ponts must appear n all vews. Otherwse, the matrx to be factorzed contans "holes" the mssng data problem. Whle addressng ths problem, projectve bundle adjustment would be an ll-posed problem f t were formulated as n (5.). Our experments showed that the source of the degeneracy les n the attempt to mnmze the projectons and the ponts at the same tme. In fact, one can treat (5.) as a separable problem [25] by alternatvely holdng P s and X j s constant. Snce the terms nvolvng ponts are ndependent of each other (the correspondng Jacoban matrx s block-dagonal), each term (thus each pont) can be mnmzed separately. hs s true for the projectons as well. Hence, we have a novel way of mnmzng (5.): startng from the ntal guess, we repeatedly perform ntersecton for each pont followed by resecton for each camera. Furthermore, both sub-problems have least squares solutons, whch makes the mplementaton easy. Elmnatng the scale factors n (5.4), we have the followng basc equatons P () X j P (2) X j uj = and v = j P X P X (3) j (3) for =,, m, and j=,, n, whch can also be wrtten n lnear form as j 8

91 8 4 2 (2) (3) () (3) (2) (3) () (3) = = j j m m m mj m m mj j j AX X P P v P P u P P v P P u L L (5.a) for all ponts X,,X n,and (3) (2) () (3) (2) () 2 2 = = n n n n n n n P P P B P P P X v X X u X X v X X u X L L (5.b) for projectons P,,P m. he soluton s the null vector of the correspondng coeffcent matrx, whch n return can be thought of mnmzng ( ) ( ) { } + = j j j j j j j X P X P v X P X P u X f 2 (2) (3) 2 () (3) ) (, (5.2a) for all X j s, and ( ) ( ) { } + = j j j j j j j X P X P v X P X P u P f 2 (2) (3) 2 () (3) ) (, (5.2b) for all P s. he dfference between (5.) and (5.2) s that algebrac errors are mnmzed n the latter. herefore the soluton from (5.) may be based. Hartley [42] suggested a correcton for ths type of problems by repeatedly reweghtng each equaton n (5.) by j X P ) (3 / untl the change of weghts becomes neglgble (we observed that 5 loops were suffcent n practce). hs scheme s mplemented n the Weghted Iteratve Egen (WIE) algorthm (Fgure 5.5). It s so named because of the way (5.) s solved: the soluton s the egenvector of A A (or B B) correspondng to the smallest

92 Intersecton Resecton For pont X j Intalze weghts Solve (a) Update weghts next pont For camera P Intalze weghts Solve (b) Updates weghts next camera Fgure 5.5 Illustraton of WIE Algorthm 3. Weghted Iteratve Egen (WIE)Algorthm Intalzaton; Repeat { For each pont X j,do { Assgn w = =w 2m =; Repeat { Reweght each equaton n A; Solve (4.b); Update weghts; } Untl weghts are stable; } For each projecton P,do { Assgn w = =w 2n =; Repeat { Reweght each equaton n B; Solve (4.a); Update weghts; } Untl weghts are stable; } } Untl the soluton converges. Fgure 5.6 Pseudo code of WIE for projectve reconstructon 82

93 egenvalue. Besdes addressng the rank defcent problem mentoned earler, ths algorthm has another advantage the szes of the nvolved matrces are drastcally reduced. For example, n (5.4), the sze of W s s 3mn. In (5.), the sze of the Jacoban matrx s 2mn (2m + 4n). But n WIE, as shown n (5.), A s of 2 m 4 and B s of 2 n Smulaton results Several smulatons have been carred out to show that WIE actually converges. he synthetc dataset nvolve 9 cameras and ponts. he ponts are randomly dstrbuted over a sphere of radus R. he cameras, whose nternal parameters are set to smulate real cameras wth an mage sze of 4 4, are arranged n two knds of confguratons. In the planar confguraton (Fgure 5.7), they are perturbed from an ntal coplanar-andparallel confguraton (defned n secton 5.2) n the drecton of the common prncpal axs. he dstance between the plane and the sphere center s Z. he perturbaton level s a random varable unformly dstrbuted between -δζ and δz, andδ s an nput. R=2cm Z=2cm δz δz Fgure 5.7 he planar confguraton 83

94 In the sphercal confguraton (Fgure 5.8), the orentaton of the cameras are specfed n Euler angles, randomly chosen between -θ and θ (also a user nput). Each camera s translated from the centrod of the ponts n the drecton of ts prncpal axs by the dstance Z (Z>R). he effect s to place the cameras on a sphercal patch of sold angle θ. Afterwards, the cameras are perturbed along ther prncpal axes n some amount between -δζ and δz, smlar to the planar case. R=2cm -θ θ Z=2cm δz δz Fgure 5.8 he sphercal confguraton he convergence of WIE s llustrated n Fgure 5.9 where the resdual of (5.) s plotted aganst the number of teratons. For the sphercal confguraton, we set δζ=%. o demonstrate that W can be made arbtrarly close to rank 4, the changes of the fve largest sngular values of W n teratons are plotted n Fgure 5.. Notce that the frst four of them were almost unchanged whle the ffth one decreased by several orders of magntude confrmng the rank-4-ness of the updated W. 84

95 ..9.8 Planar Confguraton δ=5% δ=% δ=5% δ=2%.7.6 Resdual Number of Iteratons.4.2 Cylndrcal Confguraton θ=5 deg θ= θ=5 θ=2 Resdual Number of Iteratons Fgure 5.9 Convergence of WIE 85

96 2 Planar Confguraton (δ=2%) log(value) n teratons Fve largest sngular values 2 Sphercal Confguraton (δ=%, θ=2 deg) log(value) n teratons Fve largest sngular values Fgure 5. he fve largest sngular values n teratons 86

97 Planar Confguraton δ=5% δ=% δ=5% δ=2% log (Resdual) IFA 5 6 WIE Number of Iteratons Cylndrcal Confguraton θ=5 deg θ= θ=5 θ=2 log (Resdual) IFA 5 6 WIE Number of Iteratons Fgure 5. Comparson of convergence rate 87

98 Although n our work IFA s used only to generate ntal values, there s no provson from usng t ndependently for projectve reconstructon (n the full measurement matrx case). herefore t s nterestng to compare the convergence rate between WIE and IFA. Fgure 5. s the logarthmc plot of the 2-D reprojecton error aganst the number of teratons. In ths experment, the ntal values for WIE are generated from a sngle step of SVD, so that the startng pont s the same for both algorthms. he followng are the observatons: ) Both IFA and WIE converge lnearly except durng the frst couple of teratons where the rate s slghtly super lnear; 2) WIE converges faster than IFA roughly by a factor of n the planar case, and by a factor of 4 n the sphercal case; 3) IFA converges slower n the planar confguraton than n the sphercal confguraton; 4) he general trends of both algorthms are qute consstent under dfferent parameter settngs. 5.5 Expermental Results on Nosy Data hs secton compares the performance of dfferent projectve reconstructon methods on nosy data. In addton to IFA and WIE, we have also mplemented a non-lnear mnmzaton method (MIN) usng the Levenberg-Marquardt algorthm. he comparson s made n terms of the 2-D reprojecton error (n pxels). In the experments descrbed below, except when specfcally stated, the ntal values result from IFA steps. 88

99 5.5. Experments on synthetc data Gaussan nose of zero mean and standard devaton σ was njected nto the generated mage data. he 2D error vs. the nose level s shown n Fgure 5.2. It s observed that all three algorthms behave smlarly n both confguratons. hey all exhbt lnear degradaton aganst nose. WIE and MIN are almost ndstngushable whle IFA s slghtly worse (the dfference s about one-tenth of a pxel when σ=3). hs can be explaned by the fact that WIE (5.2) emulates the objectve functon of MIN (5.5) whch s the aggregated reprojecton error, whle IFA mnmzes the rank-4 measurement (5.7) whch s a pure algebrac quantty. We have also performed tests wth ntal values generated from a sngle step of SVD and found that, n some cases, MIN fals to converge. herefore, good ntal values are necessary for MIN. On the other hand, WIE converges n all cases. But sometmes the error s greater compared to gettng the ntal values from IFA. hs ndcates that rough ntal values mght lead to local mnma. hus, for precauton, IFA (about a dozen SVD steps) should always be used. Fnally, the average computaton tme (recorded from an SGI O2 wth R5 processor) of IFA, WIE and MIN s respectvely.39, 5.42 and seconds. hs supports our earler clam about the computatonal effcency of the proposed algorthm. 89

100 fa, δ= we, δ= mn, δ= fa, δ=2% we, δ=2% mn, δ=2% Planar Confguraton 2D Error (pxel) Nose Level σ fa, θ=5 deg we, θ=5 mn, θ=5 fa, θ=2 deg we, θ=2 mn, θ=2 Cylndrcal Confguraton 2D Error (pxel) Nose Level σ Fgure 5.2 Reprojecton error vs. nose level 9

101 5.5.2 Experments on real data he frst experment s conducted on an mage sequence wth eght-frames of a computer termnal (Fgure 5.3). Corners are detected usng the Harrs corner detector [37] [64] from whch 2 are nteractvely selected and the correspondences establshed manually. Among the 2 selected ponts, 7 ponts appear n all vews. able2 (4 vews) and able3 (all 8 vews) compare the numercal results of dfferent projectve reconstructon methods. he results are consstent wth those obtaned from synthetc data: both WIE and MIN make small but necessary mprovements over IFA, but the runnng tme of WIE s sgnfcantly lower than MIN. In the second experment, 73 ponts are tracked over 9 mages of a wooden house (Fgure 5.4, courtesy of Dr. Boubakeur Boufama). Almost half of the ponts are present n any two successve vews. As a result, only WIE and MIN have been expermented. 5.6 Fnal Remarks From the expermental results on both synthetc and real data, we make the followng remarks: ) he accuracy of WIE s only margnally better than MIN, but n some cases MIN fals to converge; 2) Generally, WIE outperforms IFA. And when convergng, MIN also outperforms IFA; 3) Increasng the varaton n camera locatons (δζ and θ) enhances reconstructon qualty; 9

102 4) Among the three algorthms tested, WIE provdes the best trade-off n terms of accuracy and effcency. 92

103 Fgure 5.3 he termnal sequence able 3. Comparson over 4 vews the termnal sequence Items SVD IFA WIE MIN 2D Error (pxel) # teratons Run tme (second).4s.34s 2.45s 7.6s able 4. Comparson over 8 vews the termnal sequence Items SVD IFA WIE MIN 2D Error (pxel) # teratons Run tme (second).36s.42s 5.2s 76.69s 93

104 Fgure 5.4 he wooden house sequence able 5. Results on the house sequence Items WIE MIN 2D Error (pxel) # teratons 33 Run tme (second) 23.58s s 94

105 Chapter 6 Applcaton of Projectve Reconstructon Vew Synthess hs chapter presents an mage-based renderng system for vew synthess. We defne the problem as: gven mages of a statc scene taken from dfferent angles, how to reproject and ntegrate them nto a new mage as f t were obtaned from a vewpont that s dfferent from any of the source vews. Any mage-based renderng system must address the followng two fundamental ssues: how to reproject (or warp) exstng frames to novel vewponts how to resolve occluson n the syntheszed frame as a result of changng vewpont We formulate the warpng problem as projectve-reconstructon-based mage transfer, whch s made easy by the algorthms developed n Chapter 5. o deal wth the second ssue, we dvde the mages nto trangles and reproject them n what we call the vsblty compatble order such that occluded trangles are traversed frst. he proposed vew synthess system comprses three steps: trangulate the nput mages, sort the trangles n the vsblty compatble order, and reproject them one by one. Compared to approaches surveyed n Chapter 3, ours has the followng characterstcs: It mposes no restrctons on the vrtual camera placement. It accepts multple source vews at sparse angles. 95

106 It uses trangles rather than pxels as renderng prmtves. It handles occluson by renderng the trangles n a vsblty compatble order. It s a geometrcal method, therefore uses relatvely small number of nput mages. 6. Revew of Related Work 6.. Vew morphng he vew morphng approach proposed by Setz and Dyer [8] draws ts orgn from mage morphng frst ntroduced n the specal effects ndustry. In mage morphng, the goal s to produce contnuous transformatons from a source mage to a destnaton mage. For example, a gentleman s transformed nto a lady, or a human face to an anmal face. What s crtcal here s the contnuty of transformaton to generate the desred specal effect. Whether or not t makes "physcal" sense s not the ssue. As a matter of fact, n ther paper, Setz and Dyer show that lnearly nterpolatng two perspectve vews causes a non-lnear bendng effect n the n-between mages. her observaton, then, s that pre-warpng the nput vews to postons parallel to the lne connectng the projecton centers (called baselne frequently) makes lnear nterpolaton shape-preservng. hey presented a lengthy pre-warpng algorthm whch acheves the same goal as what we call mage rectfcaton n secton 4.3. here are two man lmtatons n Setz s method. Frst, the vrtual camera s only allowed to move on the baselne a drect result of usng lnear nterpolaton. Second, the vrtual mage plane has to be parallel to the baselne. o overcome the latter lmtaton, 96

107 they proposed a post-warpng step. However, there s no guarantee that the fnal mage s physcally vald the mage s reproducble by a conventonal camera. C P P 2 B P D p 2 p" p' A o 2 o p O 2 O Fgure 6. Illustraton of the vew synthess approach 6..2 Vew synthess Arbtrary vrtual vew placement s allowed n the vew synthess approach (Havaldar et al. [43]). Recall from secton 2..6 that four collnear ponts determne a unque crossrato nvarant to projectve transformaton. Conversely, gven three ponts and the crossrato, the locaton of the fourth pont can be predcted. he essence of the vew synthess approach s, from the correspondence nformaton between vews, to compute a cross- 97

108 rato for each scene pont. hen gven the mages of four base ponts n a thrd vew, all ponts can be reprojected, thus the new mage can be syntheszed. In Fgure 6., A, B, C, D are four non-coplanar ponts, or base ponts. O and O 2 are the projecton centers. P s an unknown 3-D pont whose mage s to be reprojected to a novel vew. Here, t s assumed that the eppolar geometry has been recovered already, and that o and o 2 are the eppoles. he cross-rato of P, denoted as cross(o, P, P 2, P), can be computed from the mages of O, P, P 2, P n the rght mage plane, that s, cross(o 2, p, p 2, p ). he magc here s that one does not need to know the 3-D coordnates of A, B, C, D, orp, P 2 n order to compute the cross-rato. In fact, one can compute p and p 2 based on the eppolar geometry only. hen, all that s requred for pont transfer are the mages of the base ponts. A shortcomng of Havaldar s approach s that t reles heavly on the accurate detecton of the four base ponts. Otherwse, there s notceable dstorton n the fnal synthess. he selecton of these ponts s emprcal n ther system Vew synthess n tensor space he approach gven by Avdan and Shashua [4] s based on the trfocal tensor concept (the tensor approach n the followng), and works for three vews. he trfocal tensor was frst ntroduced by Shashua n [84]. It descrbes an algebrac relatonshp between a pont p n the frst vew, some lne s passng through the matchng pont p n the second vew and some lne r passng through the matchng pont p n the thrd vew. In space, ths constrant s a meetng between a ray and two planes, as shown n Fgure

109 p' P p" s r p Fgure 6.2 Illustraton of trfocal tensor he theory behnd the tensor approach s that gven two vews n full correspondence and the tensor, the entre thrd vew can be syntheszed: from every matchng par p, p we can obtan p and copy the approprate color value (take the average from the two model mages, for example). One nce property about the trfocal tensor s that rgd transformaton can be drectly appled to t to generate a new tensor wth whch the vew represented by the transformaton can be syntheszed. Physcal valdty and vew placement flexblty are acheved at the same tme wthout recoverng 3-D models. Compared to the vew synthess approach of Havaldar et al., the tensor approach s more stable because t uses three vews and does not rely on any partcular set of ponts. 99

110 When there are only two vews, a specal tensor can be constructed from the elements of the fundamental matrx. In other words, the synthess process can start wth two model vews and ther fundamental matrx, as Havaldar s method, but the later steps follow the tensor machnery. Somethng that was not explctly mentoned n the orgnal paper s that rgd transformaton only apples to tensors computed from normalzed vews those whose nternal matrces are the dentty matrx! hs mples that the cameras must be precalbrated. 6.2 System Overvew Fgure 6.3 demonstrates the hgh level flow-chart of the system whch conssts of the followng stages. Stage - Corner extracton and matchng Corner ponts are extracted and matched sem automatcally. Projectve reconstructon s then performed, from whch the reprojecton error of each pont s calculated and dsplayed. hs helps the operator to refne those wth large errors. Stage 2 - Edge lnkng and labelng Agan, n the current mplementaton, ths step s done nteractvely by smply connectng ponts and markng the -junctons. Stage 3 - Constraned Delaunay rangulaton A Constraned Delaunay rangulaton (CD, de Floran [32]) s then conducted n each source mage, whch dvdes the mage nto trangles. Later, each trangle wll be warped

111 Image Image 2 Image n Human Asssted Corner Extracton and Matchng corners Edge Detecton and Labelng corners, edges, - junctons Constraned Delaunay rangulaton trangles Fully Automatc rangle Sortng rangle Warpng Fgure 6.3 System flow-chart

112 to the novel vew. A CD s a trangulaton that preserves the exstng edges. here are two reasons to perform the trangulaton. Frstly, ths avods dentfyng and matchng faces, another tedous and not-so-easy task. Secondly, ths establshes the potental to mplement warpng as a texture mappng process to make use of graphcs hardware capabltes. Stage 4 -ranglesortng As wll be detaled n secton 6.4, trangles have to be flled n a certan order so that occluson s handled correctly n the syntheszed vew. Stage 5 - rangle warpng Warpng a trangular facet of a source mage ncludes two steps: the mappng of the trangle s vertces and the fllng of pxels wthn each trangle. In our system, mappng s based on projectve reconstructon. Fllng s establshed upon the concept of homography (refer to 2..5) and s mplemented as projectve texture mappng whch s supported n OpenGL the de facto standard for graphcs programmng. In the followng sectons, we wll present our solutons to the two fundamental ssues n vew synthess, namely, warpng formulaton and vsblty resoluton. 6.3 Warpng Formulaton 6.3. Vertex mappng In Havaldar et al. [43], vertex mappng s performed through the usage of projectve depth whch appeared orgnally as a tool for object recognton (Shashua [83]). One shortcomng of ths formulaton s that t requres fve ponts to serve as a projectve 2

113 bass. he accuracy of these fve bass ponts has great mpact on the qualty of the synthess result. Another drawback s that n choosng the bass, t s necessary to make sure that none of the corner ponts are close to the mplctly defned plane-at-nfnty, whch, n practce, s rather dffcult to mantan. Mappng usng projectve reconstructon, on the other hand, avods both problems. Gven n correspondences across m vews, a projectve reconstructon can be obtaned usng one of the algorthms from Chapter 5. If the mages of at least 6 ponts n the novel vew are gven (each pont provdes 2 equatons and the projecton matrx has ndependent elements), that vew s projecton matrx can be estmated by resecton, whch s then used to reproject all remanng ponts. An addtonal advantage of ths formulaton s that t uses all mage data smultaneously, therefore s less based than the projectve depth formulaton whch uses two vews at a tme a result of beng based on the eppolar geometry rangle fllng Once the vertces are mapped, all pxels nsde the trangles need to be transferred. hs s llustrated n Fgure 6.4 where the mage of trangle ABC n the left s transferred to the rght. he transtve property of projecton states that any spatal plane (e.g. the one determned by ABC) nduces a homography between two mage planes. o compute the homography, at least 4 pars of correspondng ponts are requred. So how can a trangle be transferred? he answer s that, n ths case, the fourth par of matchng ponts s suppled by the eppoles (secton 4.2). 3

114 B A C H O O D 2 Fgure 6.4 Homography between two mage planes Images of a planar facet are related by a homography. D s the ntersecton of ABC and the segment O O 2. he mages of D are the eppoles. In projectve space, the vrtual plane spanned by ABC always has an ntersecton, denoted as D, wth the lne connectng the two camera centers (O and O2). he eppoles are the mages of D, whch, together wth the mages of the three vertces, suffce to compute the homography. Knowng that the eppole s the null vector of the fundamental matrx, we gve n next secton a method of computng fundamental matrx from projectve reconstructon Fundamental matrx computaton In computng the projectve reconstructon, we also obtan the projecton matrces of the cameras from whch the fundamental matrx relatng each par of vews s recovered. For example, denote the two matrces by P =[M t ]andp 2 =[M 2 t 2 ]wherem, M 2 are 3x3 and t, t 2 are 3x, then 4

115 F = [ t M M 2 M 2M t] x 2 where [t] x s the skew-symmetrc matrx such that [t] x a = txa for an arbtrary vector a. Obvously, computng F ths way naturally takes nto consderaton the rank 2 constrant (secton 3.3.). Furthermore, snce the projecton matrces are computed from correspondences n all vews, the soluton s much less based than the tradtonal twovew method. support top n r n r H l H r n l Fgure 6.5 Mappng of -junctons forms a gap wth the top, whle goes beyond t. n r 6.4 Addressng Partal Occluson Partal occluson s characterzed by the appearance of -junctons: an edge the ntersecton of two facets occluded by a thrd facet. he dffculty caused by a - juncton s that t s not the mage of a real 3-D pont. o overcome ths problem, we propose a 2-step soluton, referrng to Fgure 6.5. Step Establshng correspondences for -junctons 5

116 In Fgure 6.5, l and r are the prevously mentoned "matched" -junctons. Let us name the two edges formng a -juncton top and support as labeled n the fgure. Denote the correspondent pont of l n the rght mage as l '. Snce l ' also locates on the eppolar lne of l on the rght mage plane, l ' s the ntersecton of the rght support and the sad eppolar lne. Once we have the par ( l, l '), we can perform bundle adjustment on t, and subsequently, reproject the resultng pont onto the new vew. Denote the reprojecton by n l. If the new vew s between the two exstng vews, there s a gap between the projected support and top n the new vew. he same thng can be done to r.butthe projected support ntersects wth the top, formng an overlap. hs suggests warpng be done n a certan order consstent wth the occluson sequence n the new vew. Step 2 - Orderng trangular facets If Eucldean nformaton were avalable, the order would be easly determned by comparng the depth value. How do we do t n projectve space where the concept of dstance does not apply any more? We know that occluson s naturally resolved n all source mages. So the queston becomes whether the order nformaton s somehow encoded n the mages, and, f yes, how to extract t. In Fgure 6.6, the to-be-syntheszed vew s on the left whle the rght one represents one of the source vews. F and F 2 are 3-D facets, f and f 2 are ther projectons. he darkened lne emttng from O s an optcal ray. he plan lne startng from o 2 s ts projecton n the rght mage plane. It s also an eppolar lne. If F occludes F 2 n the left vew, there exsts at least one pont on F whch occludes a pont on F 2 but not vce versa. Otherwse they would ntersect, resultng n an edge whch separates each of them nto 6

117 two smaller facets. Consequently, f n the rght mage plane a pont whch belongs to f can be found closer to the eppole than a pont belongs to f 2, t can be concluded that the occluson sequence n the left mage s f before f 2,thusf should be warped after f 2.All trangles are sorted based on ths relatonshp and are warped accordngly. We say the order determned ths way s vsblty compatble, the same termnology used by McMllan et al. n [6]. Notce that t s only vald wth respect to the gven vews. If any of them s changed, the eppole s changed, hence the order. However, a perturbaton to an exstng confguraton s merely a perturbaton to the exstng order whch can be quckly rearranged. As a result, ths sortng algorthm s very effcent for walk-through type of applcatons. In the mplementaton, we only compare trangles whch share a vertex or an edge. If two trangles share an edge, we compare the opposte vertces. If they share only a vertex, the remanng vertces are checked. F F f 2 f O o Fgure 6.6 Vsblty compatble order 7

118 6.5 Addressng otal Occluson otal occluson happens when a front facet n a source vew goes to the back n the destnaton vew, completely occluded by other facets. Although the partal orderng algorthm s applcable here, we want to be able to mark those totally occluded facets so that they are gnored durng warpng, snce they are not observable n the new vew anyway. It turns out that these facets can be easly dentfed: when projected to the new vew, the orentaton of ther boundares s reversed. We can assgn an arbtrary vertex sequence (clockwse or counter-clockwse) to each facet, and check whether t s preserved after the projecton. If t s not, then we know the facet s at the back when lookng from the new camera poston. We would lke to menton that the same observaton has been made by Chung and Nevata n [9], where the total occluson s further categorzed nto orentaton-dscontnuty occluson and lmb occluson. 6.6 Results Fgure 6.7 s an example nvolvng two source vews: (a) and (b), whose corners, edges and -junctons are shown n (c) and (d) respectvely (crcled n sold gray). (e) s a syntheszed mage. If (a) or (b) were warped ndvdually, there would be gaps (dashed) and overlaps (dotted) as shown n (f) and (g). Fgure 6.8 shows four vews generated by lnearly nterpolatng three sources (top row of Fgure 5.3). Although the syntheszed vews are not physcally vald, they do not show any vsual abnormalty. Fgure 6.9 shows a fuller synthess by usng more source vews. Fgure 6. s an example of extrapolaton. he left and the center mages are the 8

119 nput, the rght s the output. he projectve reconstructon formulaton works as well n ths case, and no specal treatment s necessary. 6.7 Some Remarks Many restrctons assocated wth the current mage synthess approaches have been removed n ours. For nstance, Chen and Wllams only expermented on synthetc mages where depth nformaton s known. he systems by Setz and Dyer, and by Havaldar et al. only work for two vews. Also, the latter s based dependng on whch four ponts are chosen as the bass ponts. he tensor approach requres calbrated cameras. In addton, all exstng systems are pxel-based. herefore, they are not able to utlze the hardware renderng acceleraton that only accepts polygons as prmtves. Stll, our vew synthess system can be mproved, n at least two aspects. Frst of all, t needs the capablty to establsh correspondences automatcally. he manual process s only sutable for blocky objects whch are represented by a few number of polygonal shapes whose vertces are easly detectable. Secondly, t needs a conventonal way to specfy the vewng parameters such as rotaton and translaton. Our system can generate fly-by s, but t cannot generate a vew at a specfc locaton and vewng drecton. o be able to do ths, there s no way around but to recover some Eucldean nformaton, whch s the topc of the next chapter. 9

120 (a) vew (b) vew 2 (c) (d) (e) (f) (g) Fgure 6.7 Handlng occluson

121 Fgure 6.8 Syntheszed vews of a computer montor Fgure 6.9 A fuller synthess Fgure 6. Lookng nto a room an extrapolaton example

122 Chapter 7 Mult-vew Eucldean Reconstructon In many applcatons, projectve structures are nsuffcent. For example, n urban plannng, one wants to know the three-dmensonal postons of major cty buldngs. In the specal effects ndustry, computer graphcs artsts want seamless ntegraton of vrtual objects nto real flm footage. o generate correct vsual effects such as lghtng and shadow, Eucldean nformaton s a must. hs chapter dscusses how to upgrade a projectve structure nto a Eucldean one. hs problem was prevously tackled by several authors (e.g. Faugeras et al. [27], Hartley [4], Maybank et al. [58], and Pollefeys et al. [7]) usng the self-calbraton method, that s, to calbrate the cameras from mage data rather than ground-truth data as n the tradtonal method. Eucldean reconstructon then follows. he dea of self-calbraton was frst proposed by Maybank et al. [58] based on the Kruppa equatons, each of whch descrbes a quadratc constrant on the dual mage of the absolute conc (see secton 2.3). A workng algorthm was reported by Faugeras et al. n [27]. he major problem of ther method s that t s non-lnear and lacks a good method to generate the ntal guesses. Hartley's approach [4] assumes a fxed rotatng camera, and computes the mage of the absolute conc. rggs [95] took a dfferent approach by computng the mage of the absolute quadrc (recall from 2.3, t s the dual mage of absolute conc) usng quadratc programmng (Gll et al. [34]). Nether algorthm allows varyng focal length. Varable focal length s allowed n the method proposed by 2

123 Pollefeys et al. [7] whch s also based on estmatng the mage of the absolute quadrc. More detals about some of these methods are to be dscussed n secton he approach we take s to estmate the Projectve Dstorton Matrx (PDM, secton 2.2.2), by explctly makng use of the constrants embedded n the pnhole camera model. Camera calbraton s acheved by performng QR decomposton (Appendx A.3) on the recovered Eucldean projecton matrx. he approach s a generalzaton of what has been presented n Chapter 4 for two vews. Addtonally, a closed-form soluton s possble f there are at least three vews. By avodng dealng wth the absolute conc/quadrc, our approach s conceptually smpler. On the other hand, t can be shown that PDM n fact contans all the elements of the absolute quadrc, whch relates our method to the exstng ones just mentoned. 7. PDM Estmaton 7.. Algorthm descrpton he frst canoncal form (secton 2.2.3) of a projectve reconstructon s defned as such that the fourth column of the projecton matrx of the frst vew vanshes. Assumng the world coordnate system s concdent wth that of the frst camera, Eucldean constructon s equvalent to fndng the Projectve Dstorton Matrx H such that and [ P ] H [ A ]I =, (7.a) R p,for=2,, m, (7.b) [ P ] H = µ [ A ] 3

124 4 where µ compensates for the relatve scalng between the vews. Snce H s defned up to a scale factor, we can set one of ts elements to : = H H h h. hen, (7.a) becomes, [ ] [ ] [ ] A P P H H P = = h h h, whch mples h = and A P H =.hus = A P H h. (7.2) Plug (7.2) nto (7.b), [ ] [ ] [ ] A A R A P P A P P µ = + = p h p h p whch generates R A A P P µ = + h p (7.3a) and A µ = p. (7.3b) hus ( )( ) A A P P A A P P ~ hp h p + + (7.4) whch mples 5 equatons for each vew other than the frst one. he total number of equatonssthus5(m-). Let us now do some number countng based on the dfferent forms of A :

125 5 ). Zero-skew only In ths case, the camera model s = v f u f a A, + + = v u v v f v u u v u u f a A A. (7.5) he number of unknowns s 4m+3: four for each camera, and 3 for h. herefore 8 vews suffce for a soluton. ). Known aspect-rato and unknown but fxed prncpal pont (n addton to zeroskew) In ths case, = v f u f a A, + + = v u v v f v u u v u u f a A A. (7.6) here are m+5 unknowns now: one for each camera, the rest for u, v and h. Asaresult, 3 vews suffce for a soluton. ). Known prncpal ponts (n addton to the prevous two condtons) In ths case, each mage can be pre-translated so that the post-translated prncpal pont becomes (,). A thus has a dagonal form: = f f a A, = f f a A A. (7.7) Now, there are only m+3 unknowns, whch requres mnmally 2 vews for a soluton.

126 For bnocular stereo, t s necessary to know the prncpal ponts. he condton on known aspect rato can be replaced by equal focal length. If aprorknowledge says both condtons hold, then both should be used to help stablze the computaton. v). Sngle fxed camera hs s the case where a sngle camera s used. Whle the camera s movng, ts ntrnsc parameters are fxed: A af = β u v, f A A a f + u + β = uv + βf u u v 2 f + βf + v v 2 u v. (7.8) hree s the mnmum number of vews for a soluton. If zero-skew can be further assumed (.e. β=), then, two vews are enough Relaton to exstng methods In secton 2.3, t was shown that ω - =A - A - s the dual mage of the absolute conc Γ. As a result, the ntrnsc camera parameters are related to, and can be computed from the mage of Γ, usng Cholesk decomposton (Appendx A.2). Followng ths lne of thought, exstng methods have been concentratng on recoverng ω one way or the other (Hartley [4], Pollefeys et al. [72], rggs [95]). We now relate our formulaton n 7.. to these exstng methods. Notce that (7.3a) can be rewrtten as H P ~ h A R. hus 6

127 A A t H H H h = ω ~ P P = P Ω' P. (7.9) h H h h Clam. In (7.5), Ω s the absolute quadrc. Proof. Recall that under a Eucldean framework, the absolute quadrc has the canoncal form I Ω =. Wth the projectve dstorton H, Ω s transformed to HΩH H = ( HΩ)( HΩ) = h [ H h] (7.) whch can be easly verfed to be equal to Ω (7.) ndcates that the rank 3 condton of Ω has been naturally ncorporated nto our formulaton, reducng the constraned mnmzaton (of Pollefeys [72], and Bll rggs [95]) to a regular mnmzaton problem. In case the camera s statonary, =. It follows from (7.3b) that p =. hus (7.4) becomes ( P P ) ω ( P P ) ~ ω, (7.) whch s the bass of Hartley s method wth a rotatng camera [4]. In ths case, however, snce h cannot be estmated, Eucldean reconstructon s mpossble. A more general form of (7.) s H ω H = ω H ( ) ( ) where ( ) s the homography nduced by Π (the planeat-nfnty, secton 2..2) between the frst and the th vew. When the camera s 7

128 statonary, such as the case of Hartley's, H ( ) can be computed from correspondences of affne ponts (n front of camera). Otherwse, correspondences of vanshng ponts (at nfnty) are requred. hs suggests a stratfed approach: affne reconstructon (determnng Π ) followed by Eucldean reconstructon (determnng Γ) Intalzaton We present n ths secton a closed-form soluton when there are at least three vews, whch supples the ntal guesses to the prevous non-lnear method. he knownprncpal-pont condton s assumed. 2 From (7.4) and (7.7), four equatons: β = a β22, β 2 = β3 = β23 =,where βj s the -j th element of the left hand sde of (7.4), n 5 unknowns: 2 a = f, f h b =, c = f h, 2 d = h 3, e = h +, can be deduced. Wth at least three vews, the soluton s the h2 h3 null vector of the coeffcent matrx. Often, t s assumed that the null space s of rank. hus the soluton s the egen-vector correspondng to the smallest egen-value. hs, however, s not true n our case due to the nonlnear relatonshp among the unknowns: 2 b a 2 c + a + d 2 = e. (7.2) In fact, the dmenson of the null space s 2, meanng the true soluton s a lnear combnaton of the two bases null vectors: e=e +λe 2. Pluggng e nto (7.) results n a cubc polynomal on λ for whch a closed-form soluton exsts. It has been observed that the cubc equaton always has the specal soluton structure: λ >>λ 2 =λ 3, all real, and the wanted one s λ whch s easy to be dentfed. 8

129 When there are only two vews, a closed-form soluton does not exst. However, n the specal case of bnocular stereo where t can be assumed that A =A 2 =A, R=I, andµ=, an nteractve method such as the one mentoned n s suffcent. 7.2 Self-calbraton Here we present our self-calbraton method whch s amed at recoverng both ntrnsc and extrnsc camera parameters gven the Eucldean structure. Complete knowledge about the camera parameters form the bass for object nserton nsertng computer graphcs objects nto real mages, whch we are gong cover n secton 7.5. Our formulaton s based on (7.b). Note that at ths pont H s already computed. he task s to decompose the leftmost 3 4 matrx nto the specal form represented by the rght sde. Let us rewrte (7.b) as R [ P ] = [ A ] = [ A R A ] whch expands nto p, P = A R, and (7.3a) p = A. (7.2b) From (7.2a), we obtan A and R by performng QR decomposton on P.Snce A ( 3,3) =, A ( 3,3) = µ,thus A = A / A(3,3). s gven by A p. 9

130 7.3 Smulaton Expermental results on synthetc dataset are gven n ths secton. he sphercal confguraton n Fgure 5.8 s used agan. he reconstructon qualty s measured by the average 3-D algnment error algnng the reconstructed ponts to the ground-truth. he calbraton error s measured by the average error n focal length estmaton. Only relatve errors are recorded. For reconstructon, the absolute error s dvded by the radus of sphere on whch the ponts are located. For focal length, the absolute error s dvded by the actual focal length of each camera. Each experment s run tmes and the average result recorded. he parameters to be consdered are the prncpal pont (PP), nose level (σ), vew separaton (θ), and the number of vews (N). Frst, let us take a look at how the prncpal pont vares gven random pont locaton. InFgure7., themageszes4 4. Ideally, the prncpal pont should be at (2,2). But n fact, t vares between 7 and 23 n both drectons. he naccuracy n prncpal pont estmaton s well known [2] [], but has lttle effect on the reconstructon/self-calbraton qualty, as wll be demonstrated later. In the next several fgures, the relatve reconstructon error and relatve focal length error are plotted aganst σ, θ, Nrespectvely. In each case, PP s ether unknown but fxed (case of 7.), or known (case ). 2

131 23 #Vews=, Vew Separaton=45 deg, σ= V U Fgure 7. Locus of prncpal pont n 2 experments 8 6 3D Known PP 3D Unknown PP Focal Known PP Focal Unknown PP #Vews=, Vew Separaton=45 deg 4 2 Relatve Error (%) Nose Level (σ) Fgure 7.2 Relatve errors vs. nose level (σ) 2

132 7 6 #Vews=, σ=2 3D Known PP 3D Unknown PP Focal Known PP Focal Unknown PP 5 Relatve Error (%) Vew Separaton (deg) Fgure 7.3 Relatve errors vs. vew separaton (θ) Vew Separaton=45 deg, σ=2 3D Known PP 3D Unknown PP Focal Known PP Focal Unknown PP Relatve Error (%) Number of Vews Fgure 7.4 Relatve errors vs. number of vews (N) 22

133 he followng conclusons can be drawn. ) he estmaton of PP s almost meanngless, as t does not affect 3-D reconstructon errors. 2) he estmaton of the focal length s also qute naccurate. 3) When the nose level s low (< 2 pxels), t s better to assume a known PP (e.g. smply let PP be the mage center) snce fewer parameters are estmated and the qualty dfference s neglgble. hs s especally true when one camera s used and sub-pxel feature detecton can be acheved. 4) he accuracy on focal length estmaton s worse than that of the reconstructon. hs ndcates that the proposed method s only applcable n areas where accurate focal length s not requred, such as the one to be presented next. 7.4 Results on Real Images he two sequences termnal (Fgure 5.3) and the house (Fgure 5.4), prevously used n projectve reconstructon are used here. he average 3-D algnment errors are lsted n able 6. For the termnal sequence, 8 ground truth ponts were measured wth a Mcroscrbe 3-D dgtzer. he house sequence dd not nclude any ground truth ponts except the 3D coordnates recovered from the method of Mohr et al. [62]. Consequently, t s expected that the algnment error computed usng these coordnates (able 6) be larger than that reported n [62] (whch was.5cm). Fgure 7.5 shows three synthetc vews of the reconstructed termnal. he boxes ndcate the locatons of the camera projecton centers and the lnes are for the prncpal 23

axes. Notce that, at ths resoluton, the Eucldean characterstcs such as

6 shows the reconstructon results of the house sequence. able 6.

ermnal 4.3.9.8.2 ermnal 8.288.6.976.979 House - -.76.84 Fgure 7.

134 axes. Notce that, at ths resoluton, the Eucldean characterstcs such as orthogonalty and parallelsm are clearly observable. Fgure 7.6 shows the reconstructon results of the house sequence. able 6. Average Eucldean reconstructon error on real data Items SVD IFA WIE MIN ermnal ermnal House Fgure 7.5 Reconstructon results of the termnal sequence Fgure 7.6 wo vews of the reconstructed house 24

135 7.5 Applcaton to Object Inserton One mportant vrtual realty applcaton s the nserton of synthetc objects nto real mages. hs s the technque that s used to generate stunnng vsual effects n such blockbuster moves as tanc, Star Wars: Epsode I he Phantom Menace, etc. Forths purpose, the camera pose must frst be estmated. ypcal approaches rely on 3-D poston trackng devces, precse calbraton, fducals, or panstakngly manual work. Recently, affne nvarant has been used to provde calbraton-free augmented realty by Kutulakos et al. [5]. However, the affne representaton makes t dffcult to descrbe the synthetc objects n a tradtonal way (.e. usng postons, dstances or angles, etc). Another aspect of the same problem, whch has been largely gnored, s the mutual occluson (or other geometrcal nteractons) between the vrtual (nserted) and the real (exstng n the scene) objects. o make ths happen, accurate Eucldean models of the real objects must also be recovered. Usng mage features such as corners of objects rather than fducals as the matchng token, the proposed approach addresses camera estmaton and model reconstructon at the same tme. Implementaton wse, several advanced computer graphcs technques are also used. Addtonally, algorthms have been developed to compensate for the msalgnment caused by errors n prncpal pont estmaton, whch has been demonstrated n Fgure 7.. he overall workflow of our object nserton system runs n four steps. Frst, ntrnsc camera parameters, camera pose and scene (real) object models are recovered usng the prevous Eucldean reconstructon and self-calbraton methods. Second, computer graphcs (synthetc) objects are placed nto the world coordnate 25

136 system, and vrtual cameras are constructed usng the recovered ntrnsc camera parameters. hrd, whle the stencl buffer s turned on, render real objects n black, and synthetc ones n some chosen color other than black. hs draws the synthetc objects onto the screen wth correct occluson, and at the same tme, prepares the stencl buffer. Fourth, turn off the depth buffer and redraw the orgnal mage. By choosng approprate logcal operatons on the stencl buffer, only pxels that have not been drawn so far are drawn wth values from the correspondng mage. he result shows the synthetc objects occluded by real objects just as f the former was n the orgnal scene (n the sense that the correct geometrc relatonshp s mantaned) Implementaton ssues here are two mplementaton methods. he straghtforward one s to use ray tracng. From each pxel n the mage plane of an nput vew, a ray s emtted. If t hts the surface of a synthetc object, the pxel color s replaced by that of the object surface (after lghtng calculaton). Otherwse, the mage pxel s unalterted. he method, however, runs slowly due to the ray tracng process. Realtme anmaton s mpossble. he method that we have chosen makes use of the renderng ppelne provded by OpenGL so that hardware acceleraton s possble f the machne allows (whch s n fact true for most graphcs workstatons, Wndows N or Unx based). However, the camera parameters obtaned from our self-calbraton method create severe msalgnments between the vrtual and real objects, as shown Fgure

Fgure 7.7 Msalgnment of real and vrtual objects he reason s that, n OpenGL, the mage center and the prncpal pont must be the same, whch unfortunately s not the case when mage nose exsts. In Fgure 7.

137 Fgure 7.7 Msalgnment of real and vrtual objects he reason s that, n OpenGL, the mage center and the prncpal pont must be the same, whch unfortunately s not the case when mage nose exsts. In Fgure 7., the estmated prncpal ponts appear not only dfferent from the mage center, but also randomly dstrbuted. o solve the problem, a 3-D affne dstorton matrx s ntroduced just before the camera-to-mage projecton, as follows. Assume A, R, are the estmated camera nternal and external parameters. Let A I be the deal camera that has the same focal length as A but has the mage center as ts prncpal pont, then R A A R I [ A ] = [ A ] = [ A ]M I I. We thus construct the vrtual camera usng A I, whch guarantees correct algnment, and set the world-to-camera projecton matrx to M, whch mantans the overall projecton to be equal to the orgnal. he fnal result s shown n Fgure 7.8. Now the vrtual and real objects are perfectly regstered. Essentally, a 3-D affne dstorton s ntentonally ntroduced to compensate for the 2-D error. 27

R s s f. m y s. SPH3UW Unit 7.3 Spherical Concave Mirrors Page 1 of 12. Notes

R s s f. m y s. SPH3UW Unit 7.3 Spherical Concave Mirrors Page 1 of 12. Notes SPH3UW Unt 7.3 Sphercal Concave Mrrors Page 1 of 1 Notes Physcs Tool box Concave Mrror If the reflectng surface takes place on the nner surface of the sphercal shape so that the centre of the mrror bulges