3D Modeling Using Multi-View Images. Jinjin Li. A Thesis Presented in Partial Fulfillment of the Requirements for the Degree Master of Science

Size: px

Start display at page:

Download "3D Modeling Using Multi-View Images. Jinjin Li. A Thesis Presented in Partial Fulfillment of the Requirements for the Degree Master of Science"

Juliana Fox
6 years ago
Views:

1 3D Modelng Usng Mult-Vew Images by Jnjn L A Thess Presented n Partal Fulfllment of the Requrements for the Degree Master of Scence Approved August by the Graduate Supervsory Commttee: Lna J. Karam, Char Chatal Chakrabart Tolga M. Duman ARIZONA STATE UNIVERSITY December

2 ABSTRACT There s a growng nterest n the creaton of three-dmensonal (3D) mages and vdeos due to the growng demand for 3D vsual meda n commercal markets. A possble soluton to produce 3D meda fles s to convert exstng D mages and vdeos to 3D. The D to 3D converson methods that estmate the depth map from D scenes for 3D reconstructon present an effcent approach to save on the cost of the codng, transmsson and storage of 3D vsual meda n practcal applcatons. Varous D to 3D converson methods based on depth maps have been developed usng exstng mage and vdeo processng technques. The depth maps can be estmated ether from a sngle D vew or from multple D vews. Ths thess presents a MATLAB-based D to 3D converson system from multple vews based on the computaton of a sparse depth map. The D to 3D converson system s able to deal wth the multple vews obtaned from uncalbrated hand-held cameras wthout knowledge of the pror camera parameters or scene geometry. The mplemented system conssts of technques for mage feature detecton and regstraton, two-vew geometry estmaton, projectve 3D scene reconstructon and metrc upgrade to reconstruct the 3D structures by means of a metrc transformaton. The mplemented D to 3D converson system s tested usng dfferent mult-vew mage sets. The obtaned expermental results of reconstructed sparse depth maps of feature ponts n 3D scenes provde relatve depth nformaton of the objects. Sample ground-truth depth data ponts are used to calculate a scale factor n order to estmate the true

3 depth by scalng the obtaned relatve depth nformaton usng the estmated scale factor. It was found out that the obtaned reconstructed depth map s consstent wth the ground-truth depth data.

4 ACKNOWLEDGEMENTS I would lke to acknowledge the patence and strong support of my thess advsor Dr. Lna J. Karam n enablng me to complete my thess. I would lke to thank my commttee members Dr. Chatal Chakrabart and Dr. Tolga M. Duman for agreeng to be on my commttee. I would also lke to acknowledge the support provded to me by the members of the Image, Vdeo and Usablty (IVU) lab. I would lke to acknowledge the valuable assstance provded to me by Srenvas Varadarajan, Qan Xu, Mlnd Gde, Adthya Vshnu Murthy and Nabl Sadaka. I would lke to thank my parents and my famly for ther strong love and contrbuton. v

5 TABLE OF CONTENTS Page LIST OF TABLES v LIST OF FIGURES..... x CHAPTER.INTRODUCTION.... Motvaton... Contrbutons...3 Thess Organzaton...3. RELATED WORK.. 5. Depth Map from a Sngle Image Depth Map from Multple Image Vews BACKGROUND Camera Geometry Eppolar Geometry and Fundamental Matrx Propertes of Concs and Quadrcs General concs and quadrcs The absolute conc and the dual absolute quadrc The dual mage of the absolute conc (DIAC) The Herarchy of Transformatons IMPLEMENTED MULTI-VIEW D TO 3D CONVERSION SYSTEM Overvew of the Implemented System Scale Invarant Feature Transform (SIFT)..3 v

6 CHAPTER Page 4.. Scale-space extrema detecton Keypont localzaton Orentaton assgnment Keypont descrptor Keypont matchng Implementaton notes for the SIFT Random Sample Consensus (RANSAC) Descrpton of the RANSAC algorthm Implementaton notes for RANSAC Trangulaton Introducton of the trangulaton method Implementaton notes for trangulaton Bundle Adjustment Descrpton of the bundle adjustment algorthm Implementaton notes for bundle adjustment Metrc Upgrade Descrpton of metrc upgrade Implementaton notes for metrc upgrade Transformaton of 3D Ponts n Metrc Reconstructon Important Notes EXPERIMENTAL RESULTS Data Set Descrpton....7 v

7 CHAPTER Page 5. Sparse Depth Map Results Results for the Egypt_Temple mage set Results for the Mcrosoft_Ballet mage set Results for the Tempe_Buldng mage set Reprojecton error results Analyss of the System Performance Analyss of the scale factor stablty Effect of RANSAC on the scale factor stablty Effect of the reprojecton error on the scale factor stablty Evaluaton of the system performance wth respect to the ground-truth depth Depth reconstructon wth a lmted number of known ground-truth depth CONCLUSION Contrbutons.. 6. Future Work REFERENCES. 5 v

8 LIST OF TABLES Table ; Page Geometry propertes of dfferent types of transformatons...8 Average reprojecton error threshold for dfferent mage sets...57 Number of common feature ponts and reprojecton errors for three mage sets..84 D ponts wth mnmum depth and maxmum depth, and rato of the maxmum and mnmum depth values n 5 teratons based on Egypt_Temple mage set...86 Statstcal analyss of the depth values of the feature ponts n the Tempe_Buldng mage set for 5 runs based on the average mean square reprojecton error across all vews...88 Statstcal analyss of the depth values of the feature ponts n the Tempe_Buldng mage set for 5 runs based on the maxmum reprojecton error across all vews..9 MSE between ground-truth and scaled calculated depth values of feature ponts n the Table mage set usng dfferent numbers of known ground-truth depth values... v

9 LIST OF FIGURES Fgure ; Page Pnhole camera geometry..4 An example of a rotaton and translaton between dfferent projectve coordnates...6 Pont correspondence geometry between two vews. 6 Flowchart of the mplemented D to 3D converson system...3 Computng the Dfference of Gaussan functons at dfferent octaves and scales.. 35 Detectng maxma and mnma n the DOG functons...35 Orentaton hstogram of pxels n a 4 4 sub-regon.38 Creatng a keypont descrptor usng a 6 6 surroundng area of a keypont..38 Illustraton of bundle adjustment...54 Illustraton of 3D ponts wth respect to the world coordnates and the mddle camera coordnates under metrc transformaton frame...66 Eght vews of Egypt_Temple (sze of each vew s 5 384). 7 Eght vews of Tempe_Buldng (sze of each vew s 9 684).. 7 Eght vews of Mcrosoft_Ballet (sze of each vew s 56 9)..7 Eght vews of Table (sze of each vew s ) 7 Plot of /4 of the total number of matchng feature ponts after the SIFT between Vew 4 (left) and Vew (rght) n Egypt_Temple.75 x

10 Fgure ; Page Plot of /4 of the total number of matchng nlers after RANSAC between Vew 4 (left) and Vew (rght) n Egypt_Temple...75 Plot of /4 of the total number of common feature ponts n all vews of Egypt_Temple...76 Plot of /4 of the total number of depth values of feature ponts on the mddle vew (Vew 4) of Egypt_Temple...76 Plot of / of the total number of matchng feature ponts after the SIFT between Vew 4 (left) and Vew (rght) n Mcrosoft_Ballet...78 Plot of / of the total number of matchng feature ponts after RANSAC between Vew 4 (left) and Vew (rght) n Mcrosoft_Ballet...78 Plot of common feature ponts n all vews of Mcrosoft_Ballet...79 Plot of /4 of the total number of depth values of feature ponts n the mddle vew (Vew 4) of Mcrosoft_Ballet...79 An example of wrong calculated depth values plotted n the mddle vew (Vew 4) of Mcrosoft_Ballet...8 Plot of /4 of the total number of matchng feature ponts after the SIFT between Vew 4 (left) and Vew (rght) n Tempe_Buldng...8 Plot of /4 of the total number of matchng nlers after RANSAC between Vew 4 (left) and Vew (rght) n Tempe_Buldng 8 Plot of /4 of the total number of common feature ponts n all vews of Tempe_Buldng...83 x

11 Fgure;;; Page Plot of /4 of the total number of depth values of feature ponts on the mddle vew (Vew 4) of Tempe_Buldng...83 Plot of the depth values for feature ponts on the mddle vew (Vew 4) at two dfferent teratons for the Egypt_Temple mage set.85 Plot of /5 of the total number of the calculated depth values for feature ponts n the mddle vew (Vew 4) of the Table mage set..93 Plot of /5 of the total number of ground-truth depth values for feature ponts n the mddle vew (Vew 4) of the Table mage set..94 Zero-mean ground-truth depth values and zero-mean calculated depth values of the feature ponts detected n the Table mage set.94 Combned plot of the zero-mean ground-truth depth values and the zero-mean calculated depth values of feature ponts n the Table mage set...95 Zero-mean ground-truth depth values, and scaled zero-mean calculated depth values of the feature ponts detected n the Table mage set..95 Combned plot of the zero-mean ground-truth depth values and the scaled zeromean calculated depth values of feature ponts n the Table mage set...96 Combned plot of the ground-truth depth values and the scaled calculated depth values for feature ponts n the Table mage set usng only known ground-truth depth value.98 x

12 Fgure ; Page MSE between the ground-truth and scaled calculated depth values of feature ponts n the Table mage set usng a dfferent number of known ground-truth depth values... Combned plot of the ground-truth and reconstructed depth values of feature ponts n the Table mage set usng dfferent numbers of known ground-truth depth values... x

13 . INTRODUCTION Ths chapter presents the motvatons behnd the work n ths thess and brefly summarzes the contrbutons and organzaton of the thess.. Motvaton In recent years, wth the gant leap n mage and vdeo processng technologes, the ntroducton of three-dmensonal televsons (3D TVs) [ ] nto the commercal market s becomng a realty. Nowadays, there are many commercal companes, such as Samsung, Sony, Panasonc and LG, producng 3D TVs. The 3D TVs can be more attractve to vewers because they produce stereo scenes, whch create a sense of physcal real space. 3D vson for humans s caused by the fact that the projected ponts of the same pont n space on the two human eyes are located at dfferent dstances from the center of focus (center of fovea). The dfference between the dstances of the two projected ponts, one on each eye, s called dsparty. Dsparty nformaton s processed by hgh levels of the human bran to produce a feelng of the dstance of objects n 3D space. A 3D televson employs some technques of 3D presentaton, such as stereoscopc capture, 3D dsplay and D plus depth map technologes. Due to the success of ntroducng 3D vsual technologes, ncludng 3D games and 3D TVs, to the commercal market, the demand for a wde varety of 3D content such as 3D mages, 3D vdeos and 3D games s ncreasng sgnfcantly. To satsfy ths demand, there s an ncreasng need to create new 3D vdeo content as well as convertng exstng D vdeos to 3D format. Convertng D content

14 nto 3D depends on dfferent D to 3D converson tools. Among varous knds of D to 3D converson systems, the method that converts the D content to 3D by generatng a depth map s a popular one as ths method s effcent n the codng, transmsson and storage of the 3D content. Usng mult-vew D mages to estmate the 3D depth nformaton s a typcal method n ths category. Multple vews of the same scene provde enough nformaton of the D scene, and the mathematcal computer vson methods can be used to estmate the 3D structure. The mult-vew based D to 3D converson borrows concepts from varous applcatons ncludng mage regstraton, feature trackng, object localzaton and structure reconstructon. From the mult-vew mage sequences, the D to 3D converson system performs the object matchng between dfferent vews n order to estmate the depth map of the D scene. Another advantage of the mult-vew D to 3D converson method s that the parameters of the camera can be estmated. In addton, the mult-vew D to 3D converson method s applcable to varous acquston methods of mages, especally for mage sources captured usng hand-held and uncalbrated cameras. It requres less pror nformaton about the camera and the scene for the 3D reconstructon as compared to other 3D reconstructon models.. Contrbutons In ths thess, a D to 3D mage converson system s presented. The mplemented 3D modelng system s able to process mage sets that are obtaned usng an uncalbrated camera, wthout knowng the nformaton about the ground-truth of

15 3 the scene and the camera settngs. The technques nvolved n the mplemented D to 3D converson system consst of feature extractng and trackng, mage regstraton, three dmensonal geometry estmaton and refnement, camera calbraton and scene structure reconstructon. The D to 3D converson system n ths thess uses the scale nvarant feature transform (SIFT) to extract the features of objects and regster the feature ponts n dfferent vews. The Random Sample Consensus (RANSAC) s mplemented to remove the outlers n the correspondences, so that the nlers for the correspondng feature ponts and twovew geometres can be estmated. Trangulaton and bundle adjustment are employed later to estmate and refne the projectve reconstructon of the 3D scene. Fnally, an auto-calbraton technque s used to upgrade the projectve reconstructon of structures to the metrc coordnates. Through these combned technques, the relatve depth nformaton s estmated for feature ponts among multple vews of the scene. Dfferent mult-vew mage sets are used to test the D to 3D converson system, and expermental results are presented and analyzed. The 3D sparse depth map produced by the 3D modelng system s compared wth ground-truth data, and t s shown how a scale factor can be estmated to recover the ground-truth depth..3 Thess Organzaton Ths thess s organzed as follows. Chapter ntroduces the exstng methods for D to 3D converson. Chapter 3 presents the background materal that s related to the work n ths thess. Chapter 4 descrbes the man components of the

16 4 mplemented D to 3D converson system, and mplementaton detals are also dscussed. Chapter 5 presents the expermental results of the D to 3D converson system based on dfferent mult-vew mage sets. Chapter 6 summarzes the contrbutons of ths thess and proposes future drectons of research.

17 5. RELATED WORK Ths chapter summarzes the exstng work that s related to D to 3D converson technques. Secton. descrbes several major methods that make use of a sngle mage for depth estmaton. Secton. summarzes methods that use multple vews to reconstruct the 3D scene. Many methods were proposed for D to 3D converson n recent years. One such method s to estmate the depth mage from monocular vdeos or mages, and then the orgnal D mages or vdeos and the computed depth mages are used to obtan the 3D content, through a process known as depth mage based renderng (DIBR) []. Methods that produce stereo mage pars are presented n [3] [4]. Advantages of these methods, whch generate drectly stereo mage pars rather than D mages and ther correspondng depth maps (D+depth), are that they are sutable for many dsplay equpments as they produce an output that can be readly dsplayed for 3D vewng humans. The shortcomng of the methods of [3] [4] s that they have a lot of constrants on the camera moton and mage sets, lmtng ther uses n practcal mplementatons. Compared to the methods n [3] and [4], by estmatng the depth map for D to 3D converson, snce the depth map can be hghly compressed, the D+depth method can save a large porton of the transmsson bandwdth. The advantages of usng depth maps for D to 3D converson are dscussed n [5]. These advantages are among the reasons why the D+depth method became a popular drecton n current D to 3D converson researches.

18 6. Depth Map from a Sngle Image Some proposed D to 3D converson methods are based on a sngle mage to estmate the depth map. Several typcal methods are summarzed as below. Studes of depth values obtaned from focus cues are done n [6] and [7]. In [6], usng the relatonshp between the mage blur and the focus degree of edge pxels, a relatve pxel-resoluton depth map can be estmated. In the frst step, a macroblock depth map s calculated by dvdng the mage nto macroblocks ( 6 6) and computng the wavelet transform of each macroblock. Then, a 56- level depth map s created by thresholdng the local spatal frequences n a macroblock. Ths depth map reflects the spatal frequency content n each macroblock. The second step s to create a pxel-based depth map by estmatng the defocus degree of the edge pxels, based on the macroblock depth map. The edge pxels are detected usng a mult-scale wavelet transform by fndng the local maxma of the scaled wavelet spectrum. To dfferentate the type of edge pxels, the Lpschtz regularty [8] of an edge s computed usng the decay of wavelet transform coeffcents from a coarser to a fner scale n the neghborhood of edge pxels. The edge pxels wth a Lpschtz regularty between and are defocused edges whle the edge pxels wth Lpschtz regularty between - and are focused ones. Through combnng the edge pxels and blur degree represented by the Lpschtz regularty n the rows of an mage, the sectons between the nearby two edges can be categorzed to dfferent mage structures, thus assgned wth the depth values accordng to the depth values n the macroblocks to whch the edge

19 7 pxels belong. Due to blockng and strpe artfacts beng ntroduced by the method of [6], the work done n [7] provdes several new technques to enhance the depth map. To get the ntal depth map, the method of [7] s mplemented usng overlappng wndows of sze 6 6 to analyze the frequency energy at each pont. The resultng depth map has less block artfacts and s more smoothed. In detectng the edge pxels, a D Gaussan functon s used as the smoothng functon and among the maxma ponts, those less than a gven threshold are dscarded to reduce the nose effect. For the detected edge pxels, the dscontnutes n edges are corrected by searchng possble edge ponts n the neghborhoods of every edge pxels. Addtonally, to correct the depth values for the focused foreground objects wth unform color and texture, color-based segmentaton are used to modfy the depth values. The mage structure s used for the depth estmaton n [9]. In [9], the mage structure s descrbed usng two ampltude spectrum descrptons, the global spectral sgnature and the local spectral sgnature. The global spectral sgnature of an mage s the mean magntude of the global Fourer transform over a set of known mages. The local spectral sgnature s the mean ampltude of the local macroblock wavelet transform over a set of known mages. The spectral sgnatures reflect the general structures shared by the mage set. The global spectral sgnature reveals the domnant orentatons and textural patterns and s strongly related to the spatal structure of the scene for real-world mages. A local

20 8 spectral sgnature whch s computed usng Gabor flters, gves nformaton about the domnant orentatons and scales n a local regon n the mage and ther mean spatal dstrbuton. Torralba and Olva n [9] have llustrated that changes n mean depth of a scene not only affects the slope of the global magntude spectrum but also changes the local orentaton and scales. In [9], the frst step of depth estmaton s to separate the man-made structure and natural structure n the consdered mage by observng the dfference n energy dstrbuton across the spatal frequences of an mage. To estmate the mean depth of the mage scene, the mean depth s represented as a condtonal expectaton of the mage feature vector, whch s composed of the set of statstcal measurements that are derved from the spectral sgnatures. For a tranng data set, the jont probablty densty dstrbuton between the mean depth and the mage feature vector can be descrbed usng a cluster of Gaussan functons, and the parameters for the Gaussan functons are estmated usng the EM algorthm []. After ths, the mean depth for any new mage s estmated based on the mage feature statstcs. Some studes use scene classfcaton [] and stage classfcaton [] to get the depth value accordng to the relatve depth nformaton n these categores. Contrary to the method used n [9] whch uses the mean depth nformaton for structure classfcaton, n [], the mage s classfed nto one of a lmted number of typcal 3D scene geometres called stages and then the depth nformaton s obtaned from the stages. Each stage has a unque depth pattern and provdes the characterstcs of the scene objects such as locaton and scales. The

21 9 ntal stages are obtaned from a tranng stage usng a large database of mages, and 9 categores of stages are derved. A further stage categorzaton s performed to each stage by analyzng the dstrbuton of gradents n the mages. The Gaussan scale-space model s appled to extract the feature nformaton, and the Webull dstrbuton s used to represent the hstograms of the Gaussan dervatve flter responses. The parameters for the ntegral Webull dstrbuton are estmated usng the Maxmum Lkelhood Estmator (MLE), whch fts well the hstograms of varous types of mages. The parameters for 5 stages are traned usng the Support Vector Machne (SVM). For a gven mage, the mage s analyzed usng the Gaussan dervatve flter and Webull dstrbuton to get the feature vectors and s then ftted nto a collecton of best matchng stages whch have unque depth profles.. Depth Map from Multple Image Vews Besdes the depth estmaton from a sngle vew, the depth nformaton can be obtaned from multple vews of the scene. The projecton reconstructon of a 3D model usng the factorzaton method to get the camera moton and object structure s dscussed n [3] [4] [5] [6] and [7]. The work done n [3] bulds a measurement matrx from the D feature ponts n multple vews usng the D ponts coordnates, and uses the SVD algorthm to decompose the measurement matrx nto a product of two matrces, one representng the camera rotaton and the other one gvng nformaton about the depth of feature ponts under the projectve transformaton. The translaton

22 vectors of the cameras n multple vews are computed from the average of the rows of the measurement matrx. The metrc transformaton s computed by enforcng some constrants on the parameters of the camera matrces. To estmate the projecton matrx and the 3D ponts from D feature ponts n multple vews, the work done n [4] and [5] represents the projecton relatonshp among the D ponts, the projecton matrx and the 3D ponts usng an arbtrary scale factor, such as x P X () mn mn m n where m s the vew number, n s the ndex of 3D pont, mn s the scale factor, X n s the th n 3D pont, x mn s the th n projected D pont n the th m vew, and P m s the projecton matrx of the th m vew. In [4], the scale factor mn s calculated usng the fundamental matrces and the eppoles that are estmated from the D feature ponts. Then the D feature ponts n multple vews are weghted by the scale factor mn to form the measurement matrx, whch s decomposed further usng SVD to produce the projecton matrces and the 3D ponts. In [5], the scale factor mn, the projecton matrx and the 3D ponts are estmated recursvely by mnmzng the D reprojecton error usng the SVD factorzaton and the weghted least squares (WLS) algorthm [8]. Some teratve factorzaton methods to mnmze the reprojecton error, whch do not requre the knowledge of the scene geometry, are llustrated n [6] and [7]. Other than usng the factorzaton method for 3D modelng, the depth

23 nformaton can be generated by searchng correspondences among mult-vew mages and by applyng trangulaton on the feature ponts [9]. A complete system to buld vsual models from mult-vew camera mages s presented n []. The system can deal wth uncalbrated mage sequences acqured wth a handheld camera. Addtonally, no pror nformaton about the camera calbraton and moton s requred for ths system. The mplemented mult-vew D to 3D converson system n ths thess s based on the system of []. Whle the authors of [] provde a general framework wthout provdng detals about how to mplement the ndvdual components of the D to 3D system, ths thess presents detaled descrpton and analyss of all the mplemented components of the system. The feature detecton and matchng procedure n ths thess s dfferent from that n [].

24 3. BACKGROUND Ths chapter gves some background knowledge on 3D modelng n computer vson. In Secton 3., the camera geometry and the pnhole camera model, whch are basc for the analyss of 3D systems, are llustrated. In Secton 3., the eppolar geometry between multple vews and the fundamental matrx are ntroduced for further use. The dual absolute quadrc and ts basc propertes are descrbed n Secton Camera Geometry In computer vson, homogeneous representatons of lnes and ponts are descrbed as follows. A lne passng through the pont x, y T can be descrbed as: So, the vector T ax by c () l a, b, c s the homogeneous representaton of the lne n the projectve space. Alternatvely, the lne can be descrbed usng the vector T a, b, c. Therefore, equaton () can be wrtten n the form of two nner products: T a, b, c x, y, T l x (3) Accordng to (3), a D pont can be expressed usng a three-dmensonal vector x, y, T, whose thrd element serves as a scale factor. For a more general case, the homogeneous representaton x, y, z T of a pont denotes the pont T x / z, y / z n D vector form. Smlarly, the three-dmensonal pont T X x, y, z can be represented usng the homogeneous notaton as

25 T x, y, z,, X and the plane on whch X les s represented n homogeneous form as, 3, A 3D pont X lyng on the plane satsfes: T 4 3, (4) T,, x, y, z, T X (5), 3 4 A basc camera model s the projectve pnhole camera geometry as shown n Fg.. It s assumed that the camera center s the orgn of a Eucldean coordnate system. The camera center O C s also called the optcal center. The mage captured by the camera s typcally projected on the camera plane (also called the focal plane) behnd the camera center, wth a negatve focal length f on the z - axs. In addton, accordng to the magng mechansm of cameras, the mage on the camera mage plane s upsde-down wth respect to the real scene. In the model n Fg., the mage plane s placed to be n front of the camera center, and the dstance from the mage plane to the center pont s the focal length f. In ths latter case, the mage does not have to be nverted. The plane, whch passes through the camera center and s parallel to the mage plane, s denoted as the prncpal plane. The lne, passng through the camera center and perpendcular to the mage plane, s called the prncpal axs. The ntersecton of the prncpal axs wth the mage plane s a pont called the prncpal pont. In ths camera model (Fg. ), a 3D pont n space at a poston T x, y, z, X can be mapped to the mage plane by formng a lne startng at the

26 4 X x X X O C Y PP Z Y Fg.. Pnhole camera geometry. The mage plane s n front of the camera center O C. camera center to the pont X, and the ntersecton of ths lne wth the mage plane s a D pont x lyng on the mage plane. Usng the smlar trangles, the poston of x wth respect to the camera center can be represented n homogeneous coordnates as T f x f y,, f, n the 3D space. The z z homogeneous representaton of the D pont x on the mage plane s f x, f y, z pont PP. T, whle the orgn of the mage plane s the same as the prncpal A projectve camera [] s modeled though the projecton equaton as x PX (6) where x represents the D pont n homogeneous representaton, that s, t s a

27 3 dmensonal vector. P s a 3 4 projecton matrx. X stands for the 3D pont n homogeneous representaton, and t s a 4 dmensonal vector. The Projecton matrx P can be represented as 5 P K R t (7) where R s a 3 3 rotaton matrx representng the orentaton of the camera coordnates wth respect to the world coordnates, t s a 3 translaton vector whch shfts the camera center O C wth respect to the world coordnate system, and t s gven by t R (8) O C The transformaton (ncludng rotaton and translaton) between dfferent coordnates s shown n Fg.. In (7), K s the ntrnsc camera matrx, called the camera calbraton matrx and s gven by f s u K f v (9) where f s the focal length of the camera, s the aspect rato of the pxel sze on the mage plane n the x and y drectons, u, v represents the coordnates of the prncpal pont wth respect to the left bottom corner of the mage plane, and s s the skew factor whch s non-zero f the x and y axes of the mage coordnates are not perpendcular. 3. Eppolar Geometry and Fundamental Matrx The geometry between two vews of the same scene can be represented usng the

28 6 Y Z cam O c Z R, t O c _ cam X cam Y cam X Fg.. An example of a rotaton and translaton between dfferent projectve coordnates. X l l x x e O O C C e Fg. 3. Pont correspondence geometry between two vews. eppolar geometry. The eppolar geometry s llustrated n Fg. 3. Suppose a 3D pont X n space s projected nto two vews to generate D ponts x and x, respectvely. As three ponts form a plane, x, x and X would le on a

29 7 common plane E. The plane E s denoted as the eppolar plane. The lne connectng the two camera center s the baselne between two vews, and t also les on the eppolar plane. The ntersecton ponts of the baselne wth the two vews are the eppoles denoted by e and e, one n each vew. The lne, whch connects the D pont and the correspondng eppole on the same plane, s called the eppolar lne. The eppolar lne l n the second vew s parallel to the ray through x and the camera center O C, and t s the projected mage n the second vew of that ray. Snce the 3D pont X les on the ray through x and camera center O C, the projected D pont x of the 3D pont X n the second vew must be lyng on the eppolar lne l. From the above dscusson, any pont x n the second mage that matches the pont x, must le on the eppolar lne l, and the eppolar lne l n the second vew s the mapped mage of the ray through x and camera center O C. So, there s a mappng between the D pont n one vew and the eppolar lne n the other vew. The fundamental matrx F s defned to represent ths mappng F relatonshp x l. Smlarly, the fundamental matrx F represents the mappng between x and l. The fundamental matrx s the algebrac representaton of the eppolar geometry, and t s a 3 3 matrx wth a rank of. The eppolar lne l correspondng to the D pont x s represented by l () F x

30 The fundamental matrx relates to the correspondng eppoles e and e as follows: 8 e T F () F e () From () and (), the eppole e n the frst vew s the rght null-space of F, and the eppole n the second vew e s the left null space of F. Eppoles for two vews can be computed from the fundamental matrx usng the sngular value decomposton (SVD). Suppose M s a m n matrx; the sngular value decomposton of M s n the form of M T UV (3) where U s a m m untary matrx, s a m n dagonal matrx and V s a n n untary matrx. The dagonal entres of are the sngular values of M. The column vectors of U are the left-sngular vectors of M, and the column vectors of V are the rght-sngular vectors of M. That s, the relatonshp of the correspondng left sngular vector u, rght sngular vector v, and the sngular value can be represented as u T M T v (4) Mv u (5) Snce the rank of the fundamental matrx s, n the SVD of F, the thrd sngular value n the dagonal matrx s zero. Accordng to (4) and (5), f U and V are, respectvely, the left sngular and rght sngular matrces n the SVD

31 of the fundamental matrx F, the thrd column of the left-sngular matrx U 9 and the thrd column of the rght-sngular matrx V would correspond to the left null vector and the rght null vector of the fundamental matrx F, respectvely, and would satsfy () and (). Thus, the eppole n the second vew s computed from the thrd column of the left-sngular matrx U n the SVD of the fundamental matrx F, and the eppole n the frst vew s gven by the thrd column of the rght-sngular matrx V n the SVD of the fundamental matrx F. As stated n [], the two D ponts x and x, correspondng each to the projecton of the 3D pont X nto two dfferent vews, are related as follows: and x T F x (6) x T F x (7) From (6) and (7), the fundamental matrces, F and F, can be related as T F F (8) In ths thess, the fundamental matrx F s denoted as F for smplcty. The fundamental matrx F can be computed from the projecton matrces P and P for two vews as follows: e P F (9) P where P s the pseudo-nverse of P, descrbed as T T P P P P ()

32 and e s the skew-symmetrc matrx of e. That s, suppose e a, b, c, T c b e c a () b a Accordng to (9), n the specal case when P K and P KM m the fundamental matrx s derved as [] m M I, F () It s also possble to compute F, wthout nformaton about the camera projecton matrces, only from the correspondng mage ponts. F can be derved up to a scale factor from a mnmum of 7 pont correspondences [] or usng the 8-pont algorthm []. The 8-pont algorthm that s used to calculate the fundamental matrx F s mplemented usng the Drect Lnear Transformaton (DLT) algorthm []. Gven a set of correspondng mage ponts, whch contans more than 8 correspondences, x x, the frst step s to normalze the correspondng ponts to a new set of ponts such that the centrod of the new ponts s the coordnate T orgn,, and the average dstance of the ponts from the orgn s. Ths can be represented usng two homogeneous transformatons T and T as follows: xˆ T x (3) xˆ T x (4) The second step s to use the new set of correspondng ponts to calculate the

33 fundamental matrx F. By substtutng ˆ T and ˆ T x a, b, (6), an expanson of (6) can be expressed as [], x a, b, n a a f a b f a f 3 b a f b b f b f 3 a f 3 b f 3 f 33 (5) and Fˆ s a 3 3 matrx wth 9 unknown entres as follows f f f3 F ˆ f f f 3 (6) f 3 f 3 f33 Usng n sets of correspondences, a set of lnear equatons are formed gvng an overdetermned system of equatons as follows: A fˆ a a... an a n a a b... b n n a... a n b b... n a a n b b b... b n n b... b n a... a n b... b n... ˆ f (7) where fˆ s a column vector and s formed by unwrappng the fundamental matrx Fˆ row-wse as follows: f f f f f f f f f f T,, 3,,, 3, 3, 3, 33 ˆ (8) The least square soluton for fˆ can be computed usng the SVD. In the SVD of A, A can be factorzed as A T U V (9) The rght-sngular column vector n V that corresponds to a sngular value of zero s equal to ˆf. And by wrappng fˆ back to a 3 3 matrx, the fundamental matrx Fˆ can be computed. Note that, n practce, the smallest sngular value of A can be non-zero but s very small.

34 Snce the fundamental matrx should have a rank of, ths constrant should be enforced to ensure ths property. Once the fundamental matrx Fˆ s computed, the SVD of Fˆ s computed agan, and the smallest sngular value n the dagonal matrx s set to zero, so that the rank of the fundamental matrx becomes. Fnally, by multplyng the normalzaton matrces T and T wth ˆF, the fundamental matrx F s obtaned as 3.3 Propertes of Concs and Quadrcs 3.3. General concs and quadrcs F T F ˆT (3) The conc C, also called pont conc, s a curve n the D plane and can be descrbed usng a second-degree equaton. The hyperbola, ellpse and parabola are the man types of concs. In the homogenous representaton, the conc s represented as a 3 3 symmetrc matrx wth 5 degrees of freedom. The pont conc C can be represented n matrx form as where x s a D pont on the conc. T x T Cx (3) The dual conc C, also called lne conc, s a conc defned by the lnes that are tangent to the pont conc C as l T C l (3) The lnes l that satsfy (3) are tangent to the pont conc C. Thus, the dual

35 3 conc C s the adjont matrx of the conc C. Here, the adjont matrx of an nvertble matrx M s gven by M M M det (33) C s called the dual conc because t s defned from lnes, whle C s defned from D ponts, and there s a dualty between the D ponts and the lnes. For example, n (3), the role of the D pont x and the lne l can be nterchanged snce l T x mples x T l. In addton, the cross product of two lnes s a D pont, whle the cross product of two D ponts produces a lne, represented as l l x (34) x x l (35) In the same sense, there s also a dualty between a 3D pont and a plane. Smlarly, n 3D space, the quadrc Q s a surface defned from the 3D pont on the quadrc. Q can be represented as X T QX (36) where the quadrc Q s a symmetrc 4 4 matrx and the 3D pont X s a 4 vector. The dual quadrc are tangent to the quadrc Q as Q s a quadrc defned from the planes that T Q (37) So, a plane that s tangent to the quadrc Q satsfes (37). The dual quadrc s the adjont matrx of the quadrc. The ntersecton of a plane wth a quadrc Q

36 4 s a conc C The absolute conc and the dual absolute quadrc The absolute conc s a pont conc on the plane at nfnty. As a specal case n the homogeneous representaton of 3D ponts, f the fourth entry of the pont vector s zero, the 3D pont X, X, X, T 3 s not a real pont n space and t s called an deal pont. The deal ponts all le on an magnary plane called the plane at nfnty. The homogeneous representaton of the plane at nfnty s So that T,,, (38) T X (39) deal where X deal s an deal 3D pont on the plane at nfnty. The absolute conc can be represented n matrx form as X T X (4) where X s a 3D pont lyng on the absolute conc. The dual absolute quadrc Q s the dual of the absolute conc. The dual absolute quadrc s a surface formed of all planes tangent to the absolute conc, and s represented n the homogeneous form as a 4 4 matrx of rank 3. Here are some propertes of Q [3]: Q s a degenerate quadrc. It s sngular and ts rank s 3. It has 8 degrees of freedom.

37 5 Q s symmetrc and postve sem-defnte (PSD). The plane at nfnty s a null vector of Q. That s, Q (4) The dual absolute quadrc Q has a canoncal form under the metrc transformaton as ˆ I 3 3 Q I 44 (4) 3 so that the projectve transformaton between the Q n projectve space and that n metrc transformaton s represented as T,,, HQ H Iˆ 44 dag (43) where H s a homography matrx whch transforms Q from the projectve frame to the metrc one The dual mage of the absolute conc (DIAC) As stated n [], the mage of the absolute conc (IAC), denoted as, s used to represent the mappng between the ponts of the absolute conc on the plane at nfnty and the ponts on the camera mage plane. The IAC s the projected mage of the absolute conc on a D mage plane. The IAC s a pont conc represented as: where K s the ntrnsc camera matrx. K T K (44) The dual mage of the absolute conc (DIAC), denoted as, s the dual of

38 6 the IAC. The DIAC s the projected mage of the dual absolute quadrc Q on the mage plane by a projecton matrx P. Ths s descrbed as: T PQ P (45) As the DIAC s the adjont matrx of the IAC, the DIAC can be represented usng only the ntrnsc camera matrx K as [] T KK (46) From (46), f the DIAC s calculated, the ntrnsc camera parameters can be estmated. 3.4 The Herarchy of Transformatons In 3D space, the ponts, lnes and planes can be transformed usng a homography matrx H. Due to dfferent geometrc propertes of the transformatons, there s a herarchy of transformatons startng from the projectve transformatons to the affne transformatons, the metrc transformatons, and fnally generatng the most specalzed Eucldean transformatons. The projectve transformaton s the least strct transformaton among these four types of transformatons as t produces the most dstorton to the orgnal shapes of the objects n space. The projectve transformaton can be represented n homogeneous form as A t H P (47) T V v where A s a 3 3 matrx, t s a 3 translaton vector, V s a 3 vector and v s a scalar. H P has 5 degrees of freedom (dof). The projectve

39 7 transformaton does not preserve the orentatons or smlartes wth respect to the orgnal shapes. The homogeneous representaton of an affne transformaton s gven by A t H (48) A T where A s a 3 3 matrx and t s a 3 translaton vector. H A has degrees of freedom. The affne matrx A conssts of two fundamental transformatons, rotatons and non-sotropc scalng n X, Y and Z drectons. Thus, the smlarty of area ratos and angles between planes are not preserved n an affne transformaton. But the parallelsm of planes, the rato of areas on parallel planes and the rato of volumes are preserved. The metrc transformaton s a transformaton that conssts of rotaton, sotropc scalng and translaton. It can be represented n homogeneous form as sr t H M (49) T where t s a 3 translaton vector, s s a scalar, and R s a 3 3 rotaton matrx and s an orthogonal matrx such that T T R R RR I (5) The metrc transformaton matrx H M has 7 degrees of freedom. The metrc transformaton s strcter than the affne transformaton because t also preserves the angle between dfferent planes. The scalar s has the effect of scalng the object so that the volume s changed.

40 8 Table. Geometry propertes of dfferent types of transformatons. Name Matrx Defnton Dstorton Invarant Propertes Projectve (5 dof) H P A T V t v Intersecton and tangency of surfaces n contact. Parallelsm of planes, Affne ( dof) H A A T t volume ratos, centrods. The plane at nfnty. Metrc (7 dof) H M sr T t Volume ratos, angle ratos. The absolute conc. Eucldean (6 dof) R t H E T Volume, angle. The Eucldean transformaton s the strctest transformaton because t only rotates and translates the objects n the 3D space, wthout changng the rato and shape of the objects. The homogeneous representaton of the Eucldean transformaton s R t H E (5) T

41 where t s a 3 translaton vector and R s a 3 3 rotaton matrx and s 9 an orthogonal matrx. H E has 6 degrees of freedom. The defntons and propertes of these transformatons are summarzed n Table.

42 4. IMPLEMENTED MULTI-VIEW D TO 3D CONVERSION SYSTEM Ths chapter descrbes the mplemented procedures for the mult-vew D to 3D converson system. There are fve major steps n the 3D modelng process: mage feature detecton and regstraton usng the scale nvarant feature transform (SIFT), removng the outlers by explotng the two-vew geometry usng the random sample consensus (RANSAC), estmatng the projectve 3D structures through trangulaton, projectve transformaton refnement usng bundle adjustment, and upgradng to metrc reconstructon through auto-calbraton. The mplementaton detals are dscussed after the descrpton of each stage. Secton 4. presents an overvew of the mplemented D to 3D converson system. The fve man components of the 3D modelng system are descrbed n Sectons 4. to 4.6. Secton 4.7 descrbes how to estmate the sparse depth map relatve to the mddle camera center. Addtonal mplementaton notes are gven n Secton Overvew of the Implemented System The performance of the 3D depth estmaton mproves wth the number of avalable multple D vews. In ths work, 8 vews from dfferent vewponts are processed and they are found to be suffcent for proper depth reconstructon as shown n Chapter 5. The components of the mplemented D to 3D converson system are shown n a flowchart n Fg. 4. After readng n the multple vew mages, the frst step for relatng multple vews to each other s to extract features n each vew and match the extracted features between dfferent vews. For ths purpose, an

43 3 algorthm called the scale nvarant feature transform (SIFT) s used for the extracton and matchng of feature ponts n multple vews. Feature detecton and matchng s mplemented between the mddle vew and one of the other vews at each tme, and the two-vew geometry s estmated for each par of vews usng the correspondng feature ponts n the two consdered vews. To remove the outlers n the feature matchng, a robust algorthm called the random sample consensus (RANSAC) s appled to the matchng feature ponts. In ths step, the projecton matrces between the mddle vew and the other vews are estmated from the nlers of feature ponts. Besdes, the common feature ponts between all 8 vews are determned from the set of nlers. The next step s to retreve the structure of the 3D scene and the postons of the multple cameras usng the common feature ponts and geometres of all vews. For ths purpose, trangulaton s mplemented to produce the projectve reconstructon of the 3D scene, and the projecton matrces of multple vews are refned through bundle adjustment. After bundle adjustment, the refned common 3D feature ponts are back-projected to multple D vews and the average reprojecton errors of the D ponts n all vews are calculated. If the average reprojecton error s smaller than a threshold, the projectve reconstructon of the 3D scene s upgraded to a metrc one. If the reprojecton error s large, the procedure s repeated startng from the RANSAC step to re-estmate the geometry and structure from new sample sets of feature ponts. After the metrc upgrade, sparse depth maps, consstng of relatve depth values at the locatons of the extracted feature ponts that are common

44 3 x _ nlers X x _ common F P P P BA X BA H X metrc Fg. 4. Flowchart of the mplemented D to 3D converson system. F represents the fundamental matrx between the mddle vew and the other vews, P represents the projecton matrces for all vews, X _ BA and P _ BA are the refned 3D ponts and projecton matrces, respectvely. among all vews, can be estmated. Detals about the SIFT, RANSAC, trangulaton, bundle adjustment and metrc upgrade are dscussed n Sectons 4. to 4.6, respectvely. 4. Scale Invarant Feature Transform (SIFT) For a gven mult-vew mage or vdeo set, the frst step s to relate dfferent vews to each other by fndng, n the multple vews, the relevant feature ponts that correspond to the same 3D pont n space. A restrcted number of correspondng ponts, whch spread over most regons of a scene, s suffcent to determne the geometrc model. Thus, the frst step s to detect the sutable features ponts n the D multple vews and to match the selected feature ponts among dfferent vews. In the mplemented system, a feature detecton and trackng method called the

45 scale nvarant feature transform (SIFT) [4] s appled to detect the feature ponts and generate the nvarant feature descrptors for each feature pont. The generated feature descrptors are further used to match the feature ponts n dfferent vews. The SIFT conssts of fve major steps: scale-space extrema detecton, keypont localzaton, orentaton assgnment, keypont descrptor and keypont matchng. Ths algorthm s able to generate a large number of feature ponts that are densely dstrbuted over a wde range of scales and most locatons n the mage, whle beng robust to scalng and rotaton n 3D vewponts, and to changes n llumnaton. 4.. Scale-space extrema detecton The frst stage of the feature detecton and trackng s to fnd the canddate locatons that are nvarant to scale changes n multple vews. Ths s done by searchng for the extrema n the Gaussan scale-space. The Gaussan scale-space L x, y, for the nput mage x y as [5] x, y, Gx, y, I x y where G x, y, s the Gaussan functon gven by 33 I, s defned L, (5) (53) x y, y, e G x The dfference between two dfferent scales s calculated to generate the dfference-of-gaussan (DOG) functon, whch s represented as x, y, Gx, y, s Gx, y, s Ix, y Lx, y, s Lx, y, s D (54)

46 The detaled procedures for the extrema detectons are llustrated n Fg. 5. There are several octaves whch help to reduce the computatons of the scale-space representaton. In each octave of scale-space, the scale ncreases from the ntal scale (at the beggng of each octave) to twce of t, and each octave of scale s dvded nto an nteger number of ntervals S. So the constant factor k 34 separatng nearby scales s S k. At the th s stage n an octave, the scale s factor s k. The nput mage s convolved wth Gaussan functons to produce dfferent scale-space mages. The adjacent mage scale functons are subtracted to produce the Dfference of Gaussan (DOG) mages. In the next octave, the Gaussan mages wth twce the ntal value are down-sampled by, and the same operatons as n the prevous octave, are performed wthout losng accuracy wth respect to. As llustrated n Fg. 6, n all scale levels, the maxma and mnma ponts n the DOG mages are detected by comparng a pxel (marked wth X n Fg. 6) to ts 6 neghbors n 3 3 regons at the current and adjacent scales n the DOG. These extrema ponts are also called keyponts and they are nvarant to scale dfferences n dfferent mages. [4]

47 35 Fg. 5. Computng the Dfference of Gaussan functons at dfferent octaves and scales. Fg. 6. Detectng maxma and mnma n the DOG functons. The pont marked wth X s the evaluaton pont. The ponts marked wth crcles are the surroundng ponts used to determne whether the evaluaton pont s an extrema or not.

48 Keypont localzaton Keypont localzaton performs a more accurate localzaton of the keypont accordng to the nearby data, and elmnates ponts wth low contrast or that are poorly localzed along an edge. By settng the dervatve of the Taylor seres expanson of D x, y, [6] to be zero, the offset from the orgnal detected pont s calculated and accurate locatons of the keyponts are determned. Through thresholdng the second-order Taylor expanson of the DOG functon at the offset, the keyponts wth low contrast are removed. On the edges, there are some unstable keyponts whch have large curvatures across the edge but small curvatures n the perpendcular drecton. The parameters of a Hessan matrx are used to calculate the rato of the curvature across the edge and the curvature n the perpendcular drecton. If the rato s larger than a threshold, the keypont s dscarded Orentaton assgnment By addng the orentaton nformaton of keyponts to the content of the descrptor, the matchng process wll be nvarant to the rotaton of objects n dfferent vews. Orentaton assgnment s performed at the detected keyponts by creatng a gradent hstogram multpled by the gradent magntude and a crcular Gaussan wndow. The gradent magntude and orentaton are computed for a detected feature pont at locaton x, y n the scale mage L x, y, as x, y Lx, y Lx, y Lx, y Lx, y m (55)

49 x, y tan Lx, y Lx, y (56) L x, y L x, y The orentaton hstogram ncludes the gradent orentatons of the sample ponts surroundng the keypont (n a 4 4 regon around the keypont), weghted by the gradent magntude and a Gaussan crcular wndow wth a standard devaton equal to.5 tmes the scale of the consdered keypont. The hstogram has 36 bns representng the 36 degrees orentaton range. Fg. 7 llustrates an example of the orentaton hstogram wth only 8 bns. The domnant drecton of the keypont corresponds to the peak n the orentaton hstogram Keypont descrptor From the prevous steps, the keyponts are properly localzed and refned, and ther scale and domnant orentatons are determned. The keypont descrptors are formed to descrbe the features of the keyponts so that the correspondng keyponts can be tracked wth respect to smlar features n ther descrptors. For a feature pont, the 6 6 surroundng area of the keypont s dvded nto 4 4 subregons and s used to calculate the descrptor. After Gaussan smoothng, the gradent magntudes and orentatons of the samples n the 4 4 sub-regons are calculated and an 8 bn hstogram for each 4 4 sub-regon s generated. The values of the orentaton hstogram entres are stored as a column vector to represent the descrptor of a keypont. As the descrptor for a keypont contans a 4 4 array of hstograms, whch each contans 8 drecton bns, the dmenson of the keypont descrptor s 4 4 8=8, as llustrated n Fg. 8. In order to ensure the nvarance of the descrptors to rotaton, the descrptors, whch consst of the 37

50 38 Fg. 7. Orentaton hstogram of pxels n a 4 4 sub-regon. The hstogram has 8 bns. 6 6 Image Gradent Orentatons 4 4 Key Pont Descrptor Fg. 8. Creatng a keypont descrptor usng a 6 6 surroundng area of a keypont. The 6 6 regon s frstly smoothed by a Gaussan flter, llustrated usng the crcle. The 8 bn orentaton hstograms of the 4 4 sub-regons are calculated; thus, the descrptor conssts of 8 entres.

51 orentatons of gradents, are rotated by an angle, where s the angle of the domnant drecton of the keyponts Keypont matchng Usng the above steps, the keyponts and ther descrptors n each vew can be determned. The next step s to relate the keyponts n dfferent vews and to fnd the matchng feature ponts between two dfferent vews. A modfed K-D tree algorthm called Best-Bn-Frst method [7] s used to fnd the descrptor of the keypont n one vew wth mnmum Eucldean dstance from the descrptor n the other vew Implementaton notes for the SIFT In the mplemented system, the correspondng features are tracked between the mddle vew (Vew 4) and other vews. The camera coordnates of the mddle vew are assgned to be the same as the world coordnates, so that the projecton matrx for the mddle vew P 4 s assumed to be I The SIFT algorthm s mplemented between the mddle vew and one of the other vews at each tme; so, there are 7 loops totally among 8 vews. In the functon of runsft, t calls two functons to do the SIFT, sft and sftmatch. Frst, the sft functon s used to return the Gaussan scale-spaces and Dfference-of-Gaussan scale-spaces. In addton, the scale, the domnant orentaton and the descrptor of the keyponts are stored n the output of 'sft'. Usng the descrptors of the feature ponts, coordnates of the matchng ponts between the mddle vew and one of the other vews are calculated usng the 39

52 4 functon 'sftmatch'. In a sngle teraton, the outputs of runsft are two N matrces, where N s the number of matchng ponts. Each matrx corresponds to a vew, and each column of a matrx contans the coordnates of a D keypont n a vew. The columns wth the same ndex n the two matrces correspond to matchng D keyponts. As the number of matchng ponts between dfferent pars of vews may not be the same, a cell array consstng of matrx elements s used to store the coordnates of the matchng D ponts n all teratons. After the matchng s performed between the mddle vew and the other 7 vews, the resultng cell array conssts of 4 cells. The MATLAB SIFT toolbox that s used here, can be downloaded from [8], and s provded by Andrea Vedald. 4.3 Random Sample Consensus (RANSAC) 4.3. Descrpton of the RANSAC algorthm In the feature detecton and matchng step, snce the descrptors of feature ponts, whch are used for feature matchng n multple vews, are determned wth respect to the local sub-regon around the feature ponts, there may be some naccurate matchng between feature ponts n multple vews. These msmatchng pars of feature ponts are also called the outlers. An outler removal and model estmaton method called the random sample consensus (RANSAC) [9] s used to remove the outlers and to estmate the two-vew geometry usng the nler feature ponts. RANSAC s a robust estmaton algorthm whch

53 4 operates n an opposte manner as compared to the conventonal smoothng technques, such as Least Square Optmzaton [3]. Instead of usng a large data set to obtan an ntal soluton, followed by removng the nvald data, RANSAC uses only a small ntal data set to calculate the soluton model and, then, t determnes the soluton accuracy accordng to the applcablty of the soluton to other data ponts wth respect to certan prerequstes. The bref procedure of the RANSAC algorthm s descrbed as follows. To robustly ft a data set S nto a model and remove the outlers n the data set, frst, a relatvely small set consstng of s samples s randomly selected from the nput data set S, and the ntal soluton model s calculated usng ths small sample set. Second, the whole data set of S s ftted nto ths soluton to calculate the dstance from the orgnal data value. Those data samples whose dstance s wthn a threshold t, form the nler data set S. S s the consensus data set and only ncludes the nlers. Thrd, f the proporton of data n S to the data n S s larger than prevous trals, the sample tral number N s reestmated usng ths probablty of nlers. Furthermore, f the current tral count s larger than the estmated N, the soluton model s re-estmated usng all the nlers n S and the procedure termnates. On the other hand, f the current tral count s less than the estmated sample tral number N, a new set of s samples s selected and the calculaton s repeated from the frst step. Fnally, the largest consensus set S s selected after a number of N trals, and the model s re-

54 4 estmated usng all the sample ponts n S. In the above RANSAC algorthm, t s not necessary to compute every possble sample set s from all ponts. The number of teratons N can be determned usng the followng estmaton method. Wth a probablty p (t s usually chosen to be.99), at least one of the selectons s a sample set of s data ponts that are free from outlers. The probablty that a selected sample pont s an nler s denoted as, and thus the probablty for a sample to be an outler s (57) On one hand, p denotes the probablty that all the selected sample sets are not free from outlers, and each set contans at least one outler. On the other hand, ths probablty can also be represented as s N p (58) where s s the probablty that, n one sample set, there s at least one outler. The number of sample sets N can be determned as follows: N log p log s (59) The nterest here s n applyng RANSAC to estmate the two-vew geometry by fndng the fundamental matrx F and removng the outlers n the correspondng mage feature ponts. The sum of Sampson dstances [] [3] of the eght pars of correspondng feature ponts s used to represent the error of the fundamental matrx estmaton.

55 43 The sum of Sampson dstances of the eght pars of feature ponts F _ error s computed as where vew x j s the feature pont of the 8 _ F error d x x (6) sampson, th par of correspondences n the j,, and d x, x sampson s the the Sampson Dstance between x and x and s gven by d sampson x x T x Fx T T Fx Fx F x F, x (6) th j j n th where Fx s the n element n the product of F and x j. The fundamental matrx estmaton error F _ error n (6) needs to be small to ensure the properness of the estmated fundamental matrx. In addton, n the calculaton of the fundamental matrx F usng the 8-pont algorthm, the eght 3D ponts correspondng to the randomly selected set of D feature ponts should not be lyng on the same plane n the 3D space. If they all le on the same plane n 3D space, the estmated geometry model s not general to estmate the depth nformaton for all the 3D ponts. In order to ensure that the selected 8 correspondences do not correspond to 3D ponts lyng on the same plane, the homography matrx H between the eght pars of correspondng D feature ponts n two vews, x Hx (,,...,8), s computed usng the SVD method as follows. Snce x Hx, the cross product of x and Hx s zero.

56 44 Suppose T c b a x,, and,,, T c b a x the cross product of x and Hx can be expressed as [] 3 h h h x a x b x a x c x b x c h A T T T T T T T T T (6) where j h ( 3,, j ) s the th j column of H. In (6), the matrx A has a rank of and only two of the three equatons are lnearly ndependent. For the eght par of correspondences, by takng the frst two equatons n (6), a 9 6 matrx can be formed as h h h x a x c x b x c x a x c x b x c h B T T T T T T T T T T T T (63) After computng the SVD of, B the rght-sngular column vector that corresponds to the smallest sngular value s the soluton for h. By wrappng h back to a 3 3 matrx, the homography matrx H can be generated. Then, n order to compute the error n the estmaton of the homography matrx, H the sum of the reprojecton errors between the correspondng D feature ponts s calculated as follows 8 _ Hx x error H (64) If the 3D ponts correspondng to the eght pars of D feature ponts do not le on

57 45 the same plane, the reprojecton error H _ error s large. In ths context, the RANSAC procedure [] can be summarzed as follows. Step : Intal values are set as s 8,., p. 99, t 4 and t.. Where s s the sze of the random sample set, s the probablty that a sample s an nler, p s the probablty that all selected sample sets are nlers, t HF s the threshold for the rato of F _ error and H _ error and t s the threshold to select the nlers. Step : The value of N for repetton s calculated accordng to (59). Step 3: A random sample set of 8 correspondences ( s 8) s selected and the fundamental matrx F s calculated usng the 8-pont algorthm as descrbed n Secton 3.. HF Step 4: The rato H _ error R s used to evaluate the accuracy of the F _ error obtaned soluton of the fundamental matrx. If R s greater than a specfed threshold t HF, the soluton s deemed to be satsfactory and the procedure proceeds to the next step (Step 5); otherwse, the procedure s repeated from Step 3 by randomly selectng another eght pars of correspondng feature ponts. Step 5: For all pars of correspondng feature ponts x and x, the Sampson dstance d x, x sampson n (6) s calculated. The nlers that are consstent wth F are determned to be the feature ponts whose Sampson dstance s smaller than the selected threshold t. Step 6: The rato of the number of nlers to the number of the whole set of

58 feature ponts s calculated and s denoted as the probablty that a data pont s an nler. If s larger than the prevous computed value, the sample tral number N s re-estmated usng (59). Furthermore, f the current tral count exceeds the estmated N, the procedure goes to Step 7. Otherwse, f the current tral count s less than N, the procedure repeats from Step 3. Step 7: After N trals, F s re-estmated usng the largest set of nlers. Based on the computed fundamental matrx F, the projecton matrces P and P for two vews can be calculated. Suppose F s the fundamental matrx between Vews and and, thus, t satsfes (6). The projecton matrces P and P are determned as descrbed below. Accordng to [], gven the fundamental matrx F, the par of camera projecton matrces can be defned as 46 P K (65) I P K SF e (66) where S s any skew-symmetrc matrx, and e s the eppole n the second vew and satsfes (). The skew-symmetrc matrx s a square matrx whose transpose s equal to ts negatve, represented as The projecton matrces P K I T S S (67) T and P KA a v a, where s a scalar, a s a 3 vector, and v s a 3 vector, have the same fundamental matrx as the canoncal par P K and P KA a I. By

59 x assgnng S n (67) to be e as defned n (), the general form of the camera projecton matrces can be expressed as follows []: P K I P (68) T K e F e v e x (69) where v s any 3 vector, can be any scalar value, and the eppole e s the left sngular vector correspondng to the smallest sngular value of the SVD of the fundamental matrx F Implementaton notes for RANSAC The RANSAC algorthm s mplemented to remove the outlers from the matchng feature ponts that are calculated usng SIFT, and to estmate the fundamental matrx F between the mddle vew and other vews. Snce the projecton matrx for the mddle vew 4 other one s assgned as P K e ] F e P s assgned to be K I [ x , the, where F 4 denotes the fundamental matrx between the th 4 vew (mddle vew) and the th vew T ( x F4 x4 ). The eppole n the th vew, e, s the thrd column of the left bass matrx of the SVD of F 4, correspondng to the zero sngular value. The ransacftfundmatrx functon s used for RANSAC. In each loop, two cells of the D correspondng feature ponts, wth the frst cell correspondng to the mddle vew, are taken out of the SIFT cell array to be used as the nput to ths functon. As the projecton matrx for the frst nput s assgned to be K I 33 3 n ths functon, the frst nput element to ths functon s the set of D feature

60 ponts n the mddle vew after SIFT, and the second nput element s the set of 48 D feature ponts n the th vew after SIFT. The output of ths functon s the fundamental matrx F 4 and the D feature pont ndces of the nlers between the mddle vew and the th vew. In each run of RANSAC, the nlers between two vews are calculated. Ths procedure s mplemented between the mddle vew and each of the other vews for seven tmes. Usually, the number of the common nlers among multple vews wll become smaller as the number of vews ncreases. After RANSAC, the nlers of the common D feature ponts that are common n all the consdered eght vews are output, and the projecton matrces for all vews are calculated. The RANSAC toolbox s provded by Peter Koves from The Unversty of Western Australa. It can be obtaned from [3]. Ths software s modfed n our mplementaton by addng Step 4 n the RANSAC procedure as dscussed n Secton Trangulaton 4.4. Introducton of the trangulaton method From the RANSAC algorthm, the nlers among the correspondng D feature ponts, and the projecton matrces are calculated for multple vews. The trangulaton algorthm s mplemented to estmate the 3D geometry wth respect to the common D feature ponts n all vews. Ideally, the ntersecton of the two lnes that are formed by connectng each of the matchng D ponts and ther

61 correspondng camera centers, can be easly computed to get the correspondng 3D pont n space. But due to the presence of nose and dgtzaton errors, t s possble that the ntersecton of these two rays does not exst n the 3D space. That s why trangulaton s needed for the 3D pont estmaton. In the mplemented system, the 3D ponts are reconstructed usng a smple SVD-based algorthm smlar to the one descrbed n Secton 3.. For each D feature pont, (6) s used to relate the D pont x and the correspondng 3D pont X. The cross product of the D pont x and PX s 49 calculated for the correspondng D ponts n 8 vews, x (,,.., 8 ), as Suppose T x P X (7) x a, b, c (,,..., 8). By selectng the frst two equatons from (7) for the 8 correspondng ponts, 6 equatons are represented as 3T a p X 3T b p X... 3T a8 p8 X 3T b8 p8 X p p p p T T T 8 T 8 X X X X (7) where p j represents the th column of P (,,.., 8 ). Ths equaton set can j be expressed n the form of where A s a 6 4 matrx represented as AX (7)

62 5 3T T a p p 3T T b p p A... (73) T T 3 a8 p8 p8 3T T b8 p8 p8 To solve for the 3D coordnates of X, the SVD of A s computed as A T U V (74) If the sngular values n are arranged n descendng order, the soluton for X s the last column of V. The above trangulaton procedure assumes no nose n the estmated D ponts. Suppose there s a nosy matchng par of D feature ponts, x x, whch do not actually match each other and do not satsfy the eppolar constrant n (6). Accordng to the work done by Hartley and Sturm [33], a constraned MSE-based trangulaton method s used to fnd the correspondng coordnates of the D feature ponts. The relevant ponts x and x should be lyng close to the nosy ponts, and should also satsfy the eppolar constrant n (6). These correct matchng feature ponts are localzed by mnmzng the Eucldean dstance functon subject to the eppolar constrant d x, x d x x (75), x T Fx (76) Wth the knowledge of the eppolar geometry as dscussed n Secton 3., any

63 par of correspondng ponts must le on a par of correspondng eppolar lnes l and l n two vews, and any par of matchng ponts lyng on these two lnes wll satsfy the eppolar constrant. The optmal D ponts x and x, whch are closest to the orgnal matchng ponts, would le on a par of eppolar lnes l and l, respectvely. The dstance equaton (75) can be represented usng the dstance of the nosy ponts to the eppolar lnes: where d d x, l d x l, 5 (77) x, l represents the perpendcular dstance from pont x to lne l (,). As ndcated before, the correct matchng D ponts x and x le on these two eppolar lnes and can be found by representng the eppolar lne l n the frst mage by a parameter t as l t. Usng the fundamental matrx F, the other eppolar lne l s related to l. Thus, the dstance functon n (77) can be represented as a polynomal functon of t. The parameter t mn that mnmzes the polynomal dstance functon s computed by fndng the real roots of the nomnator n the polynomal functon and by evaluatng the dstance functon at each of the real roots. Then, the two eppolar lnes at t mn and the corrected D ponts x and x on these lnes can be calculated. The corrected D ponts x and x are used to compute the correspondng 3D pont X usng the SVD as dscussed before. At the end of trangulaton, the projectve 3D structure s reconstructed, provdng the 3D pont X, n addton to the already computed set of projecton

64 5 matrces P and the D feature ponts x for all vews Implementaton notes for trangulaton Usng the common matchng D feature ponts x and projecton matrces P for all vews, the 3D feature ponts can be calculated through trangulaton. The functon vgg_x_from_xp_ln mplements the trangulaton assumng no nose. The nput of ths functon conssts of matrces. One nput matrx stores the coordnates of 8 matchng ponts, one n each vew, correspondng to the same 3D pont. The second matrx s a three-dmensonal matrx storng the projecton matrces P of all vews. Ths functon uses the SVD method to calculate the 3D ponts as descrbed n Secton 4.4., and assumes there s no error n the feature matchng. The trangulaton toolbox s provded by Tomas Werner from the Unversty of Oxford. It can be downloaded from [34]. 4.5 Bundle Adjustment 4.5. Descrpton of the bundle adjustment algorthm Once the 3D ponts and projecton matrces have been obtaned for all vews, there s a need to refne the 3D structure through a global mnmzaton step, due to the fact that both the 3D ponts and the projecton matrces whch are derved from the fundamental matrces are susceptble to nose. Ths can be solved usng a maxmum lkelhood estmaton produced by the bundle adjustment algorthm [35]. The goal s to fnd the projectve structures P of multple vews and the 3D ponts X so that the mean square dstances between the observed D mage

65 53 feature ponts x and the reprojected D ponts PX are mnmzed. Bundle adjustment s llustrated n Fg. 9. The rays emanatng from a 3D pont and reprojected to the mage planes of multple vews form a bundle. Through bundle adjustment, gven m vews, a new set of projecton matrx P,..., m BA and 3D space ponts BA X j j,..., n wll be calculated so that the reprojected D ponts x P BA j BA X BA j become stable. The reprojected D ponts BA x j need to mnmze the followng Eucldean dstances from the ntal D feature ponts x j : P, X j m n j j, BA j mn d x x (78) A typcal method of sparse bundle adjustment uses the Levenberg-Marquardt (LM) algorthm [36] [37] to do the non-lnear mnmzaton of the reprojecton error. The LM algorthm s an teratve procedure that calculates the mnmum of a non-lnear least square problem. Gven an ntal measured vector w, an ntal parameter vector v and a functonal relaton f whch maps the parameter vector v to an estmated measurement vector as wˆ f v, the objectve s to fnd teratvely the parameter vector v that mnmzes the squared w -norm, T w, where w wˆ and w s the covarance matrx of the uncertanty of the measure vector w. Ths s done by solvng the followng equaton teratvely to get the dfference v : T T J I J J (79) w v w

66 54 Fg. 9. Illustraton of bundle adjustment. The rays back-projected from the correspondng D feature ponts n dfferent vews to a sngle 3D pont, consttute a bundle. where J s the Jacoban matrx of f, s the dampng term that assures a reducton n the error n teratons, I s the dentty matrx, and v s the dfference of the estmated parameter vector from the prevous parameter vector v. After solvng for v, the optmal parameter vector s v. v Accordng to [35], n order to use the LM algorthm for bundle adjustment, the ntal measurement vector w conssts of the observed common D feature ponts n all vews, the ntal parameter vector v s defned by all the parameters of the projecton matrces n all vews and by the 3D ponts, and the functonal relaton f can be calculated usng the projecton relatonshp between the correspondng D and 3D ponts. The measurement vector w can be represented

67 55 as T T T T T,..., xm, x,..., xm,..., xn T T nm w x,..., x (8) where m s the number of vews, n s the number of common feature ponts n each vew, x j s the th D pont n the th j vew. The measurement vector v can be represented as T T T v p,..., pm, X,..., T T n X (8) where p k s the unwrapped vector representaton of the projecton matrx correspondng to the th k vew, and X s s the th s 3D pont. The dfference parameter vector v s solved teratvely accordng to (79). The modfed parameters of the projecton matrces n all vews and the 3D ponts n space can be calculated by addng v to the orgnal v Implementaton notes for bundle adjustment The functon bundleadjustment s used to mplement the bundle adjustment for the refnement of the projectve reconstructon so that the reprojecton dstance functon (78) s mnmzed usng the Levenberg-Marquardt algorthm. The D feature ponts n all vews, the projecton matrces of all vews, and the 3D ponts estmated from trangulaton are used as the nput for ths functon. The output provded by bundleadjustment consstng of the refned projecton matrces of all vews and the coordnates of the 3D ponts. The projecton matrx for the mddle camera P 4 s set to I n the 33 3 RANSAC procedure. In the bundle adjustment procedure, the projecton matrx of

68 the mddle vew needs also to be fxed so that the coordnates of the mddle camera wll reman the same as the world coordnates. In our MATLAB mplementaton, the functon bundleadjustment provdes an opton to fx a certan number of the projecton matrces startng from the frst nput projecton matrces. Thus, nstead of usng the projecton matrx and 3D pont sets ordered from the frst vew to the eghth vew, t s better to reorder P 4, X 4 and x 4 of the mddle vew to the frst place and use the reordered projecton matrces, D ponts and 3D ponts as the nput of the bundle adjustment functon. After 56 mplementng the bundle adjustment, the output projecton matrces P BA and 3D ponts X BA are reordered back to the orgnal order. Wth P BA and X BA, n order to check the dfference between x BA and the orgnal D pont x, the 3D ponts X BA are reprojected to the D multple vews as follows: x P X (8) BA BA BA After bundle adjustment, the average devaton of the resultng x BA from the orgnal x n all vews can be calculated. If the average devaton s larger than a threshold, the reconstructed 3D scene s not accurate enough and contans a lot of nose. The program needs to go back to RANSAC and mplement the trangulaton and bundle adjustment agan. Ths s because n RANSAC, the 8 ponts are randomly chosen to compute the fundamental matrx F. If the average devaton of all vews s less than a threshold, the next step called metrc upgrade s mplemented. The threshold was adjusted for dfferent mage sets based on the

69 57 Table. Average reprojecton error threshold for dfferent mage sets. Image set Mcrosoft Buldng Temple t_avg.3.4. number of teratons needed to run the algorthm and to enforce convergence wthn a relatve range. The values of the threshold for the average devaton are dfferent for varous mage sets. They are lsted n Table for three sample mage sets (refer to Chapter 5, Secton 5., for a descrpton of these mage sets). The bundle adjustment toolbox s called Vncent toolbox, provded by Vncent Rabaud from UCSD. It can be downloaded from [38]. 4.6 Metrc Upgrade 4.6. Descrpton of metrc upgrade After the bundle adjustment, the 3D model can be reconstructed but, at ths pont, the reconstructon s done usng a projectve transformaton, whch s not suffcent to represent the proper structure of the scene. Therefore, a method to upgrade the projectve reconstructon to a metrc one s mplemented. Autocalbraton s a process of determnng nternal camera parameters and metrc reconstructon drectly from multple uncalbrated scenes. Unlke other calbraton methods whch ether depend on knowng the mage of a calbraton grd or on the known propertes of the scene such as vanshng ponts, auto-calbraton can be mplemented drectly by mposng constrants on the nternal camera parameters and the external projecton parameters. In the mplementaton of metrc upgrade, the goal s to fnd a rectfyng

70 homography H that transforms the projectve reconstructon P proj, X proj to the metrc reconstructon as P metrc P proj H, X metrc HX homography H for the metrc transformaton s descrbed as, proj 58. From [], the K H (83) T p K where K s the ntrnsc camera matrx, and the coordnates of the plane at p,. T nfnty n the projectve reconstructon are represented as T From Secton 3.3, n order to fnd H, there s a need to fnd the homography matrx H that would transform Q to ts canoncal form as dscussed n Secton The dual mage of the absolute conc (DIAC) s the projected mage of the dual absolute quadrc Q under a certan projecton P, as presented n (45). The dual absolute quadrc Q n projectve space can be transformed to ts ˆ metrc canoncal form I 4 4 by the homography H, as n (43). Furthermore, the DIAC s an entty n the mage plane of the camera that depends only on the ntrnsc camera parameters as n (46). By substtutng the camera matrx K nto (46), the DIAC can be represented as f s u sf uv u sf uv f v v (84) u v The dea of auto-calbraton s to use (84) to transfer the constrant on the ntrnsc camera matrx to a constrant on the dual absolute quadrc Q under the

71 projecton matrx P, and solve the homography matrx H after the estmaton 59 of Q. The dual absolute quadrc Q s calculated through a polynomal mnmzaton usng the Lnear Matrx Inequalty (LMI) relaxatons [39]. The constraned optmzaton problem can be stated as follows, mn f x (85) subject to x g,,,..., M (86) The LMI relaxatons are computed by addng lftng varables and constrants to lnearze the monomals n (85) and (86) up to a degree. In the LMI relaxaton, f the degree of the monomals are up to, the order of the LMI relaxaton s referred to be. The constrants on the camera matrx can be expressed usng the coeffcents of Q and serve as the objectve functon f x. The constrants x g are composed of several constrants on the Q. [3] The objectve functon for mnmzaton s represented as f Q 3 3 (87) where jk j T k p Q p and p k s the th k row of the th camera projecton matrx. For determnng the objectve functon, t s assumed that the skew factor s zero ( s ), the focal lengths n the x and y drecton are equal ( ), and the prncpal pont les at the left-bottom orgn of the mage plane ( u v ), so that the DIAC can be represented by dag f, f,. The objectve

72 6 functon n (87) enforces these condtons on the DIAC. The polynomal mnmzaton s subject to the followng constrants: () Q has to be postve sem-defnte (PSD) so that s PSD and, hence, T t can be decomposed nto KK. Postve sem-defnte means that all the elements n the matrx Q are greater than or equal to zero. Ths constrant s fulflled f all of the prncpal mnors of Q are postve. () To ensure that Q s rank defcent, the determnant of Q s set to zero. () To fx the scale of Q, the Frobenus norm of Q s set to. An equvalent mathematcal representaton of the constraned polynomal mnmzaton s as follows: Objectve functon: mn f Q 3 3 Subjected to: det where p Q p, jk T j matrx, det (88) Q (89) 4 Q jk, j,, 3 and k,,..., (9) j Q (9) k F p k s the Q s the determnant of th k column of the jk th camera projecton Q, Q s a prncpal mnor of, Q and Q s the Frobenus norm of. F Q The prncpal mnor jk Q s the determnant of the ( 4 k sub-matrx of j th k,,..., ) Q. Ths sub-matrx s

73 obtaned by removng a number of j rows and j columns wth the same ndex 6 numbers from Q. For the 4 4 matrx Q, there are 4 j possble submatrces that can be formed n ths way. The Frobenus norm of a matrx s the square root of the square sum of all entres n the matrx. After solvng for Q through the LMI relaxaton, the SVD of Q s computed to derve the homography H as explaned n detals later n Secton The 3D feature ponts X _ metrc and projecton matrces P _ metrc under metrc transformaton are represented as: Xmetrc HX proj (9) P P H metrc proj (93) The relevant sparse depth nformaton of the scene can be obtaned from the 3D ponts X metrc n (9) Implementaton notes for metrc upgrade The frst step n metrc upgrade s to calculate the dual absolute quadrc Q. Q s represented usng the symbolc parameters n the MATLAB symbolc toolbox. As Q s a 4 4 symmetrc matrx, symbolc parameters are used to represent Q. In the LMI relaxaton, the objectve functon f and the Q constrant equatons can be expressed n terms of the symbolc parameters of Q. The GloptPoly toolbox and SeDuM toolbox are used here to calculate the optmal soluton of Q that mnmzes the objectve functon f. Q

74 6 The GloptPoly s a MATLAB toolbox that helps n solvng the lnear matrx nequalty (LMI) relaxatons of global optmzaton problems. It also makes use of the SeDuM toolbox for LMI relaxatons of non-convex optmzaton problems. Usng the functon defpoly, a cell array s obtaned and used to store the mnmzaton functon and the constrant equatons, as descrbed n Secton The frst element of the cell array s the objectve functon to mnmze. The functon gloptpoly s used for the global optmzaton. The cell array contanng the objectve functon and constrant equatons serves as the nput to ths functon, and the order of the LMI relaxaton s set to [3]. If an optmum soluton exsts for mnmzng the objectve functon f, the obtaned values of the Q symbolc parameters of Q are stored n the output. After solvng for the dual absolute quadrc Q through global optmzaton, the homography H s calculated usng the SVD accordng to (43). The detaled procedure for solvng H s descrbed below. Usng the SVD, Q can be decomposed as Q UDV T (94) where U and V are untary matrces and D s a dagonal matrx wth nonnegatve real numbers (sngular values) on the dagonal. Snce Q s a symmetrc matrx, n the SVD of Q gven by (94), U V (95) The dagonal entres d of D are arranged n a descendng order ( d 44 s

75 63 nearly ). D can be represented as follows: d d d d D (96) where 44 d. As D s a dagonal matrx, t can be further decomposed as d d d d d d d d D D D sqrt sqrt (97) and (94) can be expressed as V D I UD V D UD Q sqrt sqrt sqrt sqrt 4 4 (98) where 4 4 I s a 4 4 dentty matrx. Snce 44 d, sqrt sqrt D I D 4 4 can be approxmated as d d d d d d D I D sqrt sqrt (99) and (99) can be rewrtten as d d d d d d D I D sqrt sqrt () Let

76 64 d ˆ d D sqrt () d 33 and ˆ 4 I 4 dag(,,, ) () t follows that D sqrt I D Dˆ Iˆ Dˆ 44 sqrt sqrt 44 sqrt (3) Usng (3), Q n (98) can be wrtten as Q UDˆ sqrt Iˆ Dˆ 44 sqrt V (4) U ˆ By multplyng D sqrt to the left of Q n (4) and V ˆ D sqrt to the rght, t follows that UD ˆ Q Dˆ Iˆ 44 sqrt sqrt V (5) From (43), (95) and (5), the homography matrx H s expressed as UD ˆ H (6) Fnally, knowng the homography H, the coordnates of the 3D ponts and sqrt projecton matrces n metrc reconstructon X metrc and P metrc are transformed accordng to (9) and (93), respectvely. The GloptPoly toolbox s provded by D. Henron. It can be downloaded from [4]. The SeDuM toolbox s provded by Lehgh Unversty at the lnk [4].

77 Transformaton of 3D Ponts n Metrc Reconstructon After transformng the projectve reconstructon by the homography H, the projecton matrx of the mddle camera s no longer I 33 3, as t has been transformed by a translaton vector t H and rotated by a rotaton matrx R H. The coordnates of the 3D ponts are represented wth respect to the world coordnates, but they are not consstent wth the transformed coordnates of the mddle camera under the metrc reconstructon. An example of the transformaton geometry between the world coordnates and the mddle camera coordnates s llustrated n Fg.. Due to ths, the coordnates of the 3D ponts may not be lyng on the same sde wth respect to the mddle camera center n the metrc transformaton. It results n depth values, for the 3D feature ponts, that are ether postve or negatve wth respect to the coordnates of the mddle camera. To solve ths problem, the coordnates of the 3D ponts n the metrc frame need to be transformed by the rotaton matrx R H and translaton vector t H n order to be consstent wth respect to the mddle camera. R H and t H can be computed from the decomposton of the projecton matrx P 4 _ metrc for the mddle vew n metrc transformaton as follows. From (7) and (8), the homogenous representaton of the projecton matrx P can be expressed as R t M Kt P metrc K (7) 4 _ where M KR s a 3 3 matrx, and K s the ntrnsc camera matrx and s

78 66 Z mddle X world 3D ponts O c _ mddle Y mddle O c _ world Z world X mddle Y world Fg.. Illustraton of 3D ponts wth respect to the world coordnates and the mddle camera coordnates under metrc transformaton frame. an upper-trangular matrx, R s the rotaton matrx and t s the translaton vector. As M KR and K s an upper-trangular matrx, the RQ decomposton s used to calculate K and R. The RQ decomposton uses the Gvens rotaton matrces to calculate the upper-trangular matrx K. The Gvens rotatons have three types as G x cos sn x x sn x cos x (8) G y cos y sn y sn y cos y (9) G z cos z sn z sn cos z z ()

79 67 Through multplyng M by G x on the rght of M, the frst column of M wll reman the same. Smlarly, the second and thrd column of M wll reman the same when M s multpled wth G y and G z on the rght of M. The frst step of the RQ decomposton s to multply M on the rght by G x to generate x M and to set x m 3 to zero, where x m 3 s the element of M x x located on the thrd row and the second column of M. The rotaton angle x n G x s calculated usng the followng equatons as m3 cos x m33 sn x cos x sn x () where so that m j s the element of M at the th row and th j column of M. cos m 33 x () m3 m33 sn m 3 x (3) m3 m33 and then G x s computed usng (8), () and (3). After settng x m 3 to zero, x M s multpled wth G y on the rght to generate x y M, and x y m, s set to zero. 3 G y s computed n a smlar way to G x. Fnally, x y M, s multpled wth z G on the rght to generate x y z M,, and x y z m,, s set to zero. G z s the computed smlar to G x and G y. After these

80 68 operatons, an upper-trangular matrx K s formed as K M x, y, z MG G x y G z (4) Snce G x, G y and G z are untary matrces, the rotaton matrx R can be represented as T z T y T x R G G G (5) The translaton vector t s calculated as t K (6) p4 where p 4 s the fourth column of the projecton matrx P 4 _ metrc. In ths work, the rotaton matrx and translaton vector were obtaned from the homography matrx H usng the functon vgg_kr_from_p that s ncluded n the trangulaton package ntroduced n Secton The resultng transformaton matrx T s expressed as R H t H T (7) 3 Multplyng the 3D pont after metrc upgrade by T wll transform the coordnates of the 3D ponts to make them consstent wth the mddle camera coordnates n the metrc frame, that s X TX metrc _ trans metrc (8) The sparse depth values for common feature ponts among all the vews are gven by the thrd elements of the 3D pont vectors X _. metrc trans

81 Important Notes In the mplementaton of metrc upgrade, due to the lmt of the GloptPoly toolbox, usng the orgnal projecton matrces P for the optmzaton computaton are not proper because the entres of P are too large for computatons n the GloptPoly toolbox. A pre-processng s mplemented to normalze the P matrces by an estmated camera matrx K estm at the begnnng of the 3D modelng as P norm K P (9) estm In our mplementaton, K s set as: estm.5 w K estm.5 h () where w refers to the wdth of nput mages and h s the heght of the nput mages. Due to the fact that x PX normalzaton of the projecton matrx s equvalent to normalzng the D ponts x at the begnnng of the mplementaton by multplyng them wth the nverse of K estm on the left after the SIFT, as follows: x norm K x () In the followng steps normalzaton, the projecton matrces that are calculated estm from RANSAC wll be n the normalzed form P norm.

82 5. EXPERIMENTAL RESULTS In ths chapter, the expermental results of the proposed 3D reconstructon model usng dfferent mages sets are presented and analyzed. Secton 5. ntroduces the mage sets used to evaluate the performance of the D to 3D converson system. Secton 5. llustrates the sparse depth maps for dfferent mage sets. Secton 5.3 presents a performance analyss of the scale factor n the mplemented system. 5. Data Set Descrpton In the mplementaton of the mult-vew 3D reconstructon, four dfferent mage sets for 3D reconstructon are used here: Egypt_Temple, Tempe_Buldng, Mcrosoft_Ballet and Table. The Egypt_Temple mage set s taken by a hand-held Canon dgtal camera n Egypt. The Tempe_Buldng mage set s taken from a Remote Control Arplane (RCA), and t s the brds-eye vew of the buldngs n downtown Tempe, Arzona. The Mcrosoft_Ballet mage set s downloaded from Mcrosoft s webste [4]. The Table mage set was taken usng a hand-held Canon dgtal camera for objects set up on a table. In addton, the ground-truth depth data for the Table mage set was collected by measurng the dstance of each object from the camera. Ths ground-truth data s used to further test the performance of the system. The 8 vews of the four mage sets are shown n Fg., Fg., Fg. 3 and Fg. 4, respectvely. The top four vews are Vew to Vew 4 from left to rght. The bottom four vews are Vew 5 to Vew 8 from left to rght. Although multple vews can be obtaned usng multple cameras fxed at dfferent vewponts, the 8 vews of the above mage sets are taken wth the same

5 to Vew 8 from left to rght. Fg.. Eght vews of Tempe_Buldng (sze of each vew s 9 684).

83 7 Fg.. Eght vews of Egypt_Temple (sze of each vew s 5 384). The top four vews are Vew to Vew 4 from left to rght, and the bottom four vews are Vew 5 to Vew 8 from left to rght. Fg.. Eght vews of Tempe_Buldng (sze of each vew s 9 684). The top four vews are Vew to Vew 4 from left to rght, and the bottom four vews are Vew 5 to Vew 8 from left to rght.

5 to Vew 8 from left to rght. Fg. 4. Eght vews of Table (sze of each vew s 648 486).

84 7 Fg. 3. Eght vews of Mcrosoft_Ballet (sze of each vew s 56 9). The top four vews are Vew to Vew 4 from left to rght, and the bottom four vews are Vew 5 to Vew 8 from left to rght. Fg. 4. Eght vews of Table (sze of each vew s ). The top four vews are Vew to Vew 4 from left to rght, and the bottom four vews are Vew 5 to Vew 8 from left to rght.

85 73 camera from dfferent angles wth respect to the same scene. The MATLAB envronment s used to produce the program for the mplementaton. The mult-vew mage set s read n and the ntenstes for the mage pxels are converted to the double data type for further computaton. For the Tempe_Buldng, Egypt_Temple and Table mage sets, as the mage szes are too large for MATLAB to process the mage data, blnear down-samplng s used to resze the mage to 5% of ther orgnal wdths and heghts. The mult-vew frames are stored n a three-dmensonal matrx, whose thrd dmenson ndcates the vew number. 5. Sparse Depth Map Results 5.. Results for the Egypt_Temple mage set For the Egypt_Temple mage set, the matchng feature ponts between the mddle vew (Vew 4) and Vew after SIFT s shown n Fg. 5. There are 87 matchng ponts. Due to the large number of matchng ponts, only /4 of all the matchng ponts are shown n Fg. 5 for llustraton, and are marked and connected wth each other to show the matchng clearly. The stars n the mages ndcate the locatons of the D feature ponts n each vew, and the lnes connectng two correspondng D ponts llustrate the matchng between two vews. As the matchng ponts that are calculated by SIFT only depend on the descrptors of the keyponts n the local mage regon, and are not related by the eppolar geometry between the consdered vews, there may be some ms-matched feature ponts between the two vews. Ths s llustrated n Fg. 5. There are two

86 74 ms-matched connectons that can be seen n Fg. 5. The matchng nlers between the mddle vew (Vew 4) and Vew after RANSAC are shown n Fg. 6. Here, only /4 of all the nler matchng ponts are marked and connected n Fg. 6 to llustrate the matchng clearly. Comparng Fg. 5 and Fg. 6, the two ms-matched crossng lnes are removed. RANSAC helps to remove the outlers, and the correspondng feature ponts are related by both the feature descrptors and the geometry. The number of nlers s 77. It s 98% of the correspondng feature ponts after SIFT. After feature matchng between the mddle vew and all the other vews, the common feature ponts among all 8 vews are searched and are shown n Fg. 7. The fve vews n the upper row are Vew to Vew 5 from left to rght, and the four vews n bottom row are Vew 5 to Vew 8 from left to rght. Here, only /4 of all the matchng ponts are marked and connected to llustrate the matchng clearly. We can see that the feature ponts n dfferent vews match wth each other correctly. There are totally 4 common feature ponts among the 8 vews n the Egypt_Temple mage set. Fnally, the depth values of sample common feature ponts are plotted on the mddle vew as shown n Fg. 8. Note that the calculated depth values n the mage are actually fractonal values and are rounded down to the nearest nteger for llustraton purpose. A larger depth value means that the pont s farther from the camera and a smaller depth value ndcates that the pont s closer to the camera. As the pcture s captured from the rght sde of the scene, the depth value

87 75 Fg. 5. Plot of /4 of the total number of matchng feature ponts after the SIFT between Vew 4 (left) and Vew (rght) n Egypt_Temple. Fg. 6. Plot of /4 of the total number of matchng nlers after RANSAC between Vew 4 (left) and Vew (rght) n Egypt_Temple.

88 76 Fg. 7. Plot of /4 of the total number of common feature ponts n all vews of Egypt_Temple. Fve vews on the top are Vew to Vew 5 from left to rght. Four vews on the bottom are Vew 5 to Vew 8 from left to rght. Fg. 8. Plot of /4 of the total number of depth values of feature ponts on the mddle vew (Vew 4) of Egypt_Temple. The depth values are rounded down to the nearest ntegers.

89 77 gets larger by observng the feature ponts from rght to the left n the horzontal drecton. 5.. Results for the Mcrosoft_Ballet mage set After mplementng the system on the Mcrosoft_Ballet mage set, the matchng ponts between the mddle vew (Vew 4) and Vew after SIFT are shown n Fg. 9. There are 44 matchng ponts, but only / of all the matchng ponts are marked and connected n the fgure to llustrate the matchng clearly. As the results show, there are some msmatchng ponts. The matchng nlers between the mddle vew and Vew after RANSAC are shown n Fg.. Compared to the matchng usng SIFT n Fg. 9, t s clear that RANSAC helps n removng the outlers. The number of nlers s 6, whch s 5% of the correspondng ponts after SIFT. The common feature ponts among all eght vews are shown n Fg.. There are only common feature ponts among 8 vews n Mcrosoft_Ballet. Ths s because there are not many dfferent structures n the multple vews of ths mage set. The plot of depth values for feature ponts on the mddle vew s shown n Fg.. As before, larger depth values ndcate beng farther from the camera and vce versa. As the number of common feature ponts s not large enough, the estmated depth value can be naccurate n dfferent trals. One example of naccurate depth calculaton that s obtaned from one run s shown n Fg. 3.

90 78 Fg. 9. Plot of / of the total number of matchng feature ponts after the SIFT between Vew 4 (left) and Vew (rght) n Mcrosoft_Ballet. Fg.. Plot of / of the total number of matchng feature ponts after RANSAC between Vew 4 (left) and Vew (rght) n Mcrosoft_Ballet.

Four vews on the bottom are Vew 5 to Vew 8 from left to rght. Fg.

91 79 Fg.. Plot of common feature ponts n all vews of Mcrosoft_Ballet. Four vews on the top are Vew to Vew 4 from left to rght. Four vews on the bottom are Vew 5 to Vew 8 from left to rght. Fg.. Plot of /4 of the total number of depth values of feature ponts n the mddle vew (Vew 4) of Mcrosoft_Ballet.

92 8 Fg. 3. An example of wrong calculated depth values plotted n the mddle vew (Vew 4) of Mcrosoft_Ballet Results for the Tempe_Buldng mage set Usng the Tempe_Buldng mage set as the nput to the 3D reconstructon system, the matchng ponts between the mddle vew (Vew 4) and Vew after SIFT are shown n Fg. 4. There are 39 matchng ponts, but only /4 of all the matchng ponts are shown for clarty n Fg. 4 to show the matchng clearly. Also, some ms-matched feature ponts can be seen n Fg. 4. The matchng nlers between the mddle vew (Vew 4) and Vew after RANSAC are shown n Fg. 5. Only /4 of all the nler feature ponts are shown n ths fgure for clarty. The number of nlers s 3, 97% of whch correspond to the matchng feature ponts after SIFT.

93 8 The common feature ponts among all 8 vews are shown n Fg. 6. Only /4 of all the matchng ponts are shown n ths fgure. There are totally 36 correspondng feature ponts among the eght vews n Tempe_Buldng. Fnally, the depth values on the mddle vew are plotted as shown n Fg. 7. As before, a larger depth value ndcates beng farther from the camera and a smaller depth value means that the feature pont s closer to the camera center Reprojecton error results Table 3 shows the number of common feature ponts and the average mean square reprojecton error n all eght vews after bundle adjustment for the three mage sets. The number of common feature ponts n Egypt_Temple and Tempe_Buldng s large enough to reconstruct the 3D scene correctly, whle that n Mcrosoft_Ballet s not suffcent to estmate the correct depth values all the tme.

94 8 Fg. 4. Plot of /4 of the total number of matchng feature ponts after the SIFT between Vew 4 (left) and Vew (rght) n Tempe_Buldng. Fg. 5. Plot of /4 of the total number of matchng nlers after RANSAC between Vew 4 (left) and Vew (rght) n Tempe_Buldng.

95 83 Fg. 6. Plot of /4 of the total number of common feature ponts n all vews of Tempe_Buldng. The fve vews on the top are Vew to Vew 5 from left to rght. The four vews on the bottom are Vew 5 to Vew 8 from left to rght. Fg. 7. Plot of /4 of the total number of depth values of feature ponts on the mddle vew (Vew 4) of Tempe_Buldng.

96 84 Table 3. Number of common feature ponts and reprojecton errors for three mage sets. Image Set No. of feature ponts Average Reprojecton Egypt_Temple Mcrosoft_Ballet.95 Tempe_Buldng Analyss of the System Performance 5.3. Analyss of the scale factor stablty The 3D scene modeled n ths system s a metrc reconstructon of the scene. Compared to the 3D scene under the Eucldean frame, the metrc transformaton dffers from the Eucldean one by a scale factor s. Under the metrc reconstructon, the dstance of the objects to the camera center s scaled by s from the ground-truth data. The scale factor and the depth values of the feature ponts, thus, can vary wth dfferent runs of the system. The depth values generated by two dfferent runs are shown n Fg. 8 for the Egypt_Temple mage set, as an example. Although the depth values can be dfferent due to the randomness of the scale factor s, for all 3D ponts n the same run, the scale factor would be the same. That s, suppose the ground-truth depth values for two feature ponts are D and D, the depth values calculated by the D to 3D converson system should be sd and sd. For any gven par of feature ponts, the rato of ther calculated depth values should reman constant n dfferent runs.

97 85 Fg. 8. Plot of the depth values for feature ponts on the mddle vew (Vew 4) at two dfferent teratons for the Egypt_Temple mage set.

98 86 Table 4. D ponts wth mnmum depth and maxmum depth, and rato of the maxmum and mnmum depth values n 5 teratons based on Egypt_Temple mage set. Iteraton Max depth Max depth poston Mn depth Mn depth poston Max depth Mn depth 4.88 (3.47, 66.) (9.636, 336.6) (3.47, 66.) (9.637, ) (3.47, 66.).599 (9.636, 336.6) (3.47, 66.).687 (9.636, 336.6) (3.47, 66.) (9.644, 336.6).35 The feature ponts correspondng to the maxmum and the mnmum depth values are chosen to perform the analyss of the scale factor. Usng the Egypt_Temple mage set, the coordnates of the maxmum-depth feature pont and mnmum-depth feature pont n the mddle vew (Vew 4) and the rato of ther depth values for fve dfferent teratons are shown n Table 4. From the results n Table 4, t can be seen that the rato of the maxmum depth value and the mnmum depth value remans stable n dfferent runs. The standard devaton of the max depth and mn depth rato s.4. It can be concluded that the randomness of the scale factor n the metrc transformaton s the major cause of dfferent depth values n dfferent runs Effect of RANSAC on the scale factor stablty In the mplementaton of RANSAC, as the eght feature ponts that are used to calculate the fundamental matrx are randomly chosen from all the feature ponts, the two-vew geometry wll not reman the same at dfferent runs. The threshold

99 87 t that s used to select the nlers of the feature ponts, as dscussed n Secton 4.3., may affect the calculated depth values of feature ponts. Usng the 3D modelng system mplemented n ths thess, a total of 5 runs (fve for each value of t ) are mplemented for three dfferent values of threshold t ( t =.,.5 and.). The results for the Egypt_Temple mage set are smlar to those n Table 4. The results based on the Tempe_Buldng mage set are shown n Table 5. From the results n Table 5, the rato of the maxmum and the mnmum depth values become less stable f the value of the threshold t ncreases. Ths s because the constrant to choose the nlers n RANSAC wll become less strct when the threshold ncreases, and ths produces more nose n the estmated depth values Effect of the reprojecton error on the scale factor stablty Comparng the standard devaton of the max depth and mn depth rato n the Egypt_Temple and Tempe_Buldng mage sets, t can be seen that the standard devaton for the Egypt_Temple mage set s.4, and that for the Tempe_Buldng mage set s much larger than.4. The threshold of the average reprojecton error t _ avg after bundle adjustment, as dscussed n Secton 4.5., also has an effect on the stablty of the scale factor s. The threshold of the average reprojecton error t _ avg n Tempe_Buldng s much larger than that n Egypt_Temple. Ths s because the mnmum reprojecton error of the Tempe_Buldng mage set converges to a larger value than the mnmum reprojecton error of the Egypt_Temple mage set, due to the fact that the structure

100 88 Table 5. Statstcal analyss of the depth values of the feature ponts n the Tempe_Buldng mage set for 5 runs based on the average mean square reprojecton error across all vews. t=. Run Mn depth Max depth Mean depth STD of depth Max depth Mn depth Max depth poston Mn depth poston (354.37,6.55) (4.,59.69) (354.37,6.54) (3.98,59.7) (354.37,6.55) (3.98,59.7) (354.4,6.58) (3.95,59.73) (354.37,6.55) (3.98,59.7) ST t=.5 Run Mn depth Max depth Mean depth STD of depth Max depth Mn depth Max depth poston Mn depth poston (496.7,74.6) (3.98,59.7) (354.37,6.55) (3.98,59.7) (354.35,6.55) (3.97,59.7) (496.7,74.6) (3.98,59.7) (354.37,6.55) (3.98,59.7) STD t=. Run Mn depth Max depth Mean depth STD of depth Max depth Mn depth Max depth poston Mn depth poston (354.37,6.55) (3.98,59.7) (496.7,74.6) (3.98,59.7) (354.37,6.55) (3.98,59.7) (496.7,74.6) (3.98,59.7) (496.7,74.6) (3.98,59.7) STD

101 of the Tempe_Buldng mage set s more complcated. As the threshold of the reprojecton error gets larger, the average reprojecton errors n all the D vews are larger and, thus, the accuracy of the detected D feature ponts s worse. Ths wll produce more nose n the estmated structure of the 3D scene. Dfferent methods to compute the reprojecton error n the D to 3D converson system also affect the stablty of the scale factor n the metrc reconstructon. For example, f another norm other than the average MSE reprojecton error among the D feature ponts n all vews s used, the estmated depth values can be affected as shown n Table 6. For the results n Table 6, the reprojecton error s calculated as follows. Frst, the average mean square reprojecton error of D feature ponts s calculated for each vew. Then, the fnal reprojecton error s taken to be the maxmum average mean square reprojecton error among all vews. It can be seen that usng the latter method for computng the reprojecton error, results n a maxmum to mnmum depth rato that s less stable than that n Table Evaluaton of the system performance wth respect to the ground-truth depth Another mportant method to analyze the performance of the mplemented 3D reconstructon system s to evaluate the calculated depth values wth respect to ground-truth data. As dscussed n Secton 5.3., the calculated depth values dffer from the ground-truth depth values by a random scale factor s. In order to compare the results wth the ground-truth depth, the computed depth values must be scaled to be wthn the same range as the ground-truth data. The scale factor s 89

102 9 Table 6. Statstcal analyss of the depth values of the feature ponts n the Tempe_Buldng mage set for 5 runs based on the maxmum reprojecton error across all vews. t=. Run Mn depth Max depth Mean depth STD of depth Max depth Mn depth Max depth poston Mn depth poston (354.4,6.53) (3.95,59.69) (354.45,6.5) (3.95,59.7) (354.37,6.55) (3.98,59.7) (496.7,74.6) (3.98,59.7) (496.8,74.6) (3.95,59.7) ST t=.5 Run Mn depth Max depth Mean depth STD of depth Max depth Mn depth Max depth poston Mn depth poston (354.37,6.55) (3.98,59.7) (354.4,6.53) (3.95,59.69) (354.37,6.55) (3.98,59.7) (496.7,74.6) (3.98,59.7) (496.75,74.5) (3.95,59.69) ST t=. Run Mn depth Max depth Mean depth STD of depth Max depth Mn depth Max depth poston Mn depth poston (354.37,6.55) (3.98,59.7) (354.45,6.5) (3.95,59.7) (354.37,6.55) (3.98,59.7) (354.37,6.55) (3.98,59.7) (354.37,6.55) (3.98,59.7) ST

103 9 calculated usng the followng method. Gven the calculated depth values for the common feature ponts and the correspondng ground-truth depth values, the frst step s to convert both depth data sets to zero-mean data sets by subtractng the average depth value from each data set. Second, for both the zero-mean groundtruth depth data set and the zero-mean calculated depth data set, the average of the postve depth values and the average of the negatve depth values are computed separately, and these are denoted by avg _ truth _ pos, avg _ truth _ neg, avg _ calc _ pos and avg _ calc _ neg, respectvely. The estmated scale factor s s computed as avg _ truth_ pos avg _ truth_ neg s () avg _ calc_ pos avg _ calc_ neg Fnally, by multplyng the zero-mean calculated depth values wth the scale factor s, and addng the average of the ground-truth depth values to the scaled zero-mean calculated depth values, the scaled calculated depth values wll be n the same scale range as the ground-truth depth values. To study the mplemented system performance as dscussed above, the Table mage set, whose multple vews are taken ndoor wth the manual measurement of the dstances from objects to the camera center, s used for analyss. Fg. 9 shows /5 of the estmated depth values for the mddle vew (Vew 4). The correspondng ground-truth depth values are plotted on the mddle vew n Fg. 3. The average of the ground-truth depth values of all feature ponts s 58., and t s 4.3 for the calculated depth values. Note that the unt of the depth value s

104 9 nch. After subtractng the average from each depth value data set, the zero-mean ground-truth depth values and zero-mean calculated depth values are shown n Fg. 3. The combnaton of these two plots s shown n Fg. 3. In the zero-mean ground-truth depth data set and zero-mean calculated depth data set, avg _ truth_ pos.85, avg _ truth _ neg.5, avg _ calc_ pos. and avg _ calc _ neg 9.6. The scale factor s calculated accordng to () usng these parameter, s After scalng the zero-mean calculated depth values, the zero-mean ground-truth depth values and scaled calculated depth values are plotted n ascendng order n Fg. 33. By addng the average of ground-truth depth values to the zero-mean scaled calculated depth values, the combned plot of the ground-truth depth values and scaled calculated depth values s shown n Fg. 34. The mean square error (MSE) between the ground-truth and the scaled calculated depth values of feature ponts n the mddle vew s.869.

105 93 Fg. 9. Plot of /5 of the total number of the calculated depth values for feature ponts n the mddle vew (Vew 4) of the Table mage set.

106 94 Fg. 3. Plot of /5 of the total number of ground-truth depth values for feature ponts n the mddle vew (Vew 4) of the Table mage set. (a) (b) Fg. 3. (a) Zero-mean ground-truth depth values, and (b) zero-mean calculated depth values of the feature ponts detected n the Table mage set.

Structure from Motion

Structure from Motion Structure from Moton Structure from Moton For now, statc scene and movng camera Equvalentl, rgdl movng scene and statc camera Lmtng case of stereo wth man cameras Lmtng case of multvew camera calbraton