Discrete-Continuous Depth Estimation from a Single Image

Size: px

Start display at page:

Download "Discrete-Continuous Depth Estimation from a Single Image"

Kenneth Newman
6 years ago
Views:

Dscrete-Contnuous Depth Estmaton from a Sngle Image Maomao Lu, Matheu Salzmann, Xumng He NICTA and CECS, ANU, Canberra {maomao.lu, matheu.salzmann, xumng.he}@ncta.com.

To address ths, we explot the avalablty of a pool of mages for whch the depth s known.

represent relatonshps between neghborng superpxels. The soluton to ths dscrete-contnuous optmzaton problem s then obtaned by performng nference n a graphcal model usng partcle belef propagaton.

1 Dscrete-Contnuous Depth Estmaton from a Sngle Image Maomao Lu, Matheu Salzmann, Xumng He NICTA and CECS, ANU, Canberra {maomao.lu, matheu.salzmann, xumng.he}@ncta.com.au Abstract In ths paper, we tackle the problem of estmatng the depth of a scene from a sngle mage. Ths s a challengng task, snce a sngle mage on ts own does not provde any depth cue. To address ths, we explot the avalablty of a pool of mages for whch the depth s known. More specfcally, we formulate monocular depth estmaton as a dscrete-contnuous optmzaton problem, where the contnuous varables encode the depth of the superpxels n the nput mage, and the dscrete ones represent relatonshps between neghborng superpxels. The soluton to ths dscrete-contnuous optmzaton problem s then obtaned by performng nference n a graphcal model usng partcle belef propagaton. The unary potentals n ths graphcal model are computed by makng use of the mages wth known depth. We demonstrate the effectveness of our model n both the ndoor and outdoor scenaros. Our expermental evaluaton shows that our depth estmates are more accurate than exstng methods on standard datasets.. Introducton In ths paper, we address the problem of scene depth estmaton from a sngle mage. Estmatng the depth of a general scene from a monocular, statc vewpont s a very challengng task, snce no relable cues, such as stereo correspondences, or moton, can be exploted. In recent years, much progress has been made towards accurate 3D scene reconstructon from sngle mages. For nstance, smple geometrc assumptons (.e., box models) have proven effectve to estmate the layout of a room [9, 7, 27]. Smlarly, for outdoor scenes, the Manhattan, or blocks world, assumpton has been utlzed to perform 3D scene layout estmaton [7]. These box models, however, are lmted to represent smple structures, and are therefore ll-suted to obtan detaled 3D reconstructons. NICTA s funded by the Australan Government as represented by the Department of Broadband, Communcatons and the Dgtal Economy and the ARC through the ICT Centre of Excellence program. Fgure. Depth estmaton from a sngle mage: Input mages and depth maps estmated by our method. In contrast, several methods have been proposed to drectly estmate the depth of mage (super)pxels [24, 25]. In ths context, t was shown that explotng addtonal sources of nformaton, such as user annotatons [22], semantc labels [8], or the presence of repettve structures [30], could help mprovng reconstructon accuracy. Unfortunately, such addtonal nformaton s not avalable n general. Recently, nonparametrc approaches were therefore ntroduced to handle ths case [3, 5, 6]. Gven an nput mage, these approaches proceed by retrevng smlar mages n a pool of mages for whch the depth s known. The depths of the retreved canddates are then employed n conjuncton wth smoothness constrants to estmate a depth map. Whle ths has acheved some success, as suggested n [3] n the context of stereo, the gradent-aware smoothng strategy often poorly reflects the real 3D scene observed n the mage. In ths paper, we ntroduce a method that addresses ths ssue by modelng depth estmaton as a dscrete-contnuous optmzaton problem. In partcular, n addton to the standard contnuous varables that encode the depth of the superpxels n the nput mage, we make use of dscrete varables that allow us to model complex relatonshps between

2 neghborng superpxels. Depth estmaton can then be expressed as nference n a hgher-order, dscrete-contnuous graphcal model. More specfcally, gven an nput mage, we make use of a nonparametrc approach to retreve smlar mages n a dataset of mages wth known depth. We explot the depths of these mages to construct data terms for the contnuous varables n our model. Furthermore, we employ dscrete varables to encode the occluson relatonshps between neghborng superpxels. The nteractons of several dscrete varables can then be expressed wth juncton potentals, whch defne nvald confguratons. These dscrete occluson varables also let us defne smoothness constrants that better reflect realstc scenes. We make use of partcle belef propagaton [2, 20] to perform nference n the resultng hgher-order, dscrete-contnuous graphcal model. In contrast to most exstng methods whch typcally consder ether ndoor scenes, or outdoor ones, we demonstrate the effectveness of our model n these two scenaros. Our experments show the benefts of our dscrete-contnuous formulaton, whch yelds state-of-the-art accuraces on the NYU v2 ndoor scenes dataset [28] and on Make3D [25]. 2. Related Work Estmatng the depth of a scene from mages s one of the major goals of computer vson. Therefore, t has attracted a lot of attenton over the years. Here, we focus on the advances that have been made n the recent years. A classcal approach to reconstructng 3D scenes conssts n explotng vdeo. In ths context, 3D scene flow s one of the most popular and mature approaches to depth estmaton [3, 4, 29]. Smlarly, structure-from-moton [5] and SLAM [9] have now reached the stage where exstng systems can effcently handle huge amounts of mages. Therefore, they are now beng ntegrated nto 3D scene understandng methods that jontly detect, or segment objects whle performng 3D reconstructon [6, 8, 23]. Whle here we tackle the monocular, statc case, t was shown that the depth maps obtaned from sngle mages could also have a benefcal mpact on vdeo-based 3D reconstructon [3]. When t comes to sngle mages, 3D reconstructon methods have not yet attaned the same degree of maturty as vdeo-based technques. Nonetheless, much progress has been made n recent years. In partcular, for ndoor scenes, effectve technques have been proposed to estmate the layout of rooms. These methods typcally rely on box-shaped models, and try to ft the box edges to those observed n the mage [9, 7, 27]. The same smple geometrc pror, blocks world, was exploted n outdoor scenes [7]. In [0], a more accurate geometrc model was employed, but the results reman only a rough estmate of surface normals. The smple geometrc models descrbed above do not allow us to obtan a detaled 3D descrpton of the scene. In contrast, several methods have proposed to drectly estmate the depth of mage (super)pxels. Snce a sngle mage does typcally not provde enough nformaton to estmate depth, other sources of nformaton have been exploted. In partcular, n [22], depth was predcted from user annotatons. In [8], ths was acheved by makng use of semantc class labels. Alternatvely, the presence of repettve structures n the scene was also employed for 3D reconstructon [30]. Wth the recent popularty of depth sensors, sparse depths have also been used to estmate denser depth maps [2]. In ths work, however, we focus on the scenaro where no such sources of nformaton are avalable. In ths settng, supervsed learnng technques were the frst to provde realstc results by learnng the parameters of a Markov random feld [24, 25]. More recently, several nonparametrc approaches were ntroduced [3, 5, 6]. These methods explot the avalablty of a set of mages for whch the depths are known. Depth n the nput mage s then estmated by frst retrevng smlar mages n ths set, and optonally warpng ther depth usng SIFT flow. These (warped) depth maps are then utlzed n the objectve functon of a nonlnear optmzaton problem that encourages the resultng depth to be smooth. Our work s close n sprt to that of [3, 5, 6] n the sense that we also make use of a nonparametrc approach to retreve canddate depth maps. However, we avod the warpng process of [3, 5], whch s computatonally expensve and does not necessarly mprove the qualty of the canddates. More mportantly, we ntroduce the use of dscrete varables that allow us to model more complex relatonshps between neghborng superpxels, and formulate depth estmaton as nference n a dscrete-contnuous graphcal model. As evdenced by our results, ths formulaton s benefcal n terms of accuracy of the estmated depth, and proved effectve for both the ndoor and outdoor scenaros. 3. Dscrete-Contnuous Depth Estmaton We now descrbe our approach to depth estmaton from a sngle mage. To ths end, we frst derve the Condtonal Random Feld (CRF) that defnes our problem, and dscuss the nference method that we use. We then defne the dfferent potentals utlzed n our model. 3.. Dscrete-Contnuous CRF Our goal s to estmate the depth of the pxels observed n a sngle mage depctng a general scene. We formulate ths problem n terms of superpxels, makng the common assumpton that each superpxel s planar. The pose of a superpxel s then expressed n terms of the depth of ts centrod and ts plane normal. Furthermore, we make use of addtonal dscrete varables that encode the relatonshp of two neghborng superpxels. In partcular, here, we con-

3 sder 4 types of relatonshps encodng the fact that the two superpxels () belong to the same object; () belong to two dfferent but connected objects; () belong to two objects that form a left occluson; and (v) belong to two objects that form a rght occluson. Here, the noton of left and rght occlusons follows the formalsm of [] based on edge drectons. Gven these varables, we express depth estmaton as an nference problem n a dscrete-contnuous CRF. More specfcally, let Y = {y,..., y S } be the set of contnuous varables, where each y R 4 concatenates the centrod depth and plane normal of superpxel, and where S s the total number of superpxels n the nput mage. Furthermore, let E = {e p } p E be the set of dscrete varables, where each e p {so, co, lo, ro}, whch ndcates same object (so), connected but dfferent objects (co), left occluson (lo) and rght occluson (ro), respectvely. E s the set of pars of superpxels that share a common boundary. Gven these varables, we then form a CRF, where the jont dstrbuton over the random varables factorzes nto a product of non-negatve potentals. Ths jont dstrbuton can be wrtten as p(y, E) = Ψ (y ) Ψ α (y α, e α ) Ψ β (e β ), Z α β where Z s a normalzaton constant,.e., the partton functon, Ψ s a unary potental functon over the contnuous varables that defnes the data term for depth, and Ψ α and Ψ β are potentals over mxed varables and dscrete varables, respectvely, whch encode the smoothness and consstency between depth and edge types. Inference n the graphcal model s then performed by computng a MAP estmate. By workng wth negatve log potental functons, e.g., φ (y ) = ln (Ψ (y )), ths can be expressed as the optmzaton problem (Y, E ) () = argmn φ (y ) + φ α (y α, e α ) + φ β (e β ). Y,E α β The potentals that we use here are dscussed n Secton 3.2. To handle mxed dscrete and contnuous varables, we make use of partcle (convex) belef propagaton (PCBP) [20], whch lets us obtan an approxmate soluton to the optmzaton problem (). More specfcally, PCBP proceeds by teratvely solvng the followng steps:. Draw N s random samples y j, j N s around the prevous MAP soluton for each varable y. 2. Compute the (approxmate) MAP soluton of the dscrete CRF formed by the dscrete varables {e p } and by utlzng the random samples {y j } as dscrete states for the varables {y }. In practce, we draw samples for the plane normal of the superpxels accordng to a Fsher-Bngham dstrbuton, whch forces them to have unt norm. Samples for the depth of the centrod of each superpxel are drawn accordng to a Gaussan dstrbuton. At each teraton, we tghten the samplng around the prevous MAP soluton. The approxmate MAP of the dscrete CRF s obtaned by dstrbuted convex belef propagaton [26]. In ths work, we make use of a nonparametrc approach to obtan a reasonable ntalzaton for the algorthm. In partcular, we retreve the K mages most smlar to the nput mage from a set of mages for whch the depth s known. To ths end, we perform a nearest-neghbor search based on concatenated GIST, PHOG and Object Bank features and drectly make use the depth of the retreved mages,.e., n contrast to [3, 5], we do not warp the depth of the retreved mages. The retreved K depth maps then drectly act as states n the frst round of PCBP,.e., no random samples are used n ths round. In the next secton, we descrbe the specfc potentals used n the optmzaton problem () Depth and Occluson Potentals The objectve functon n () contans three dfferent types of potentals nvolvng, respectvely, contnuous varables only, dscrete and contnuous varables, and dscrete varables only. Below, we dscuss the functons used n these three dfferent types of potentals. Potentals for contnuous varables: The potentals nvolvng purely contnuous varables are unary potentals, and are of two dfferent knds. For the frst one, we explot the K canddates retreved by the magebased nearest-neghbor strategy mentoned n the prevous secton. The frst potental encodes the fact that the fnal depth should reman close to at least one canddate. To ths end, we make use of the squared depth dfference. More specfcally, assumng a calbrated camera, the depth d uj of pxel u j = (u j, v j ) n superpxel can be obtaned by ntersectng the vsual ray passng through u j wth the plane defned by y. Ths lets us wrte the potental φ c (y ) = K mn k= N p N p (d uj j= (y ) d uj k, )2, (2) where N p s the number of pxels n superpxel, and d uj k, denotes the depth of the k th canddate for superpxel at pxel u j. In practce, nstead of drectly usng the canddate depth, we ft a plane to the canddate superpxels and use the ntersecton of ths plane wth the vsual rays. Ths provdes some robustness to nose n the canddates. As a second unary potental for the contnuous varables, we also make use of the canddate depths, but n a less d-

4 rect manner. More specfcally, we tran 4 dfferent Gaussan Process (GP) regressors, each correspondng to one dmenson of the varable y. The nput to each regressor s composed of the correspondng measurement of the canddates for superpxel. We found these nputs to be more relable than mage features. For each GP, we used an RBF kernel wth wdth set to the medan squared dstance computed over all the tranng samples. For more detals on GP regresson, we refer the reader to [2]. Gven the regressed value y r for superpxel, we compute the depth duj r, at each pxel u j n the same manner as before, and wrte the potental φ r (y ) = w r N p N p (d uj j= (y ) d uj r, )2, (3) where w r s the weght of ths potental relatve to φ c (y ). In practce, we also use the regressed value y r as a state for superpxel n the frst round of PCBP where no samplng s performed. Potental for mxed varables: Our model also explots a potental that nvolves both contnuous and dscrete varables. In partcular, we defne a potental that encodes the compatblty of two superpxels that share a common boundary and the correspondng dscrete varable. Ths potental can be expressed as φ m,j(y, y j, e,j ) = w m g,j n n j 2 + N b,j N b,j N b,j (d um m= N b,j m= (dum φ o,j (y, y j, e,j ) (y ) d um j (y j )) 2 f e,j = so (y ) d um j (y j )) 2 f e,j = co otherwse, where w m s the weght of ths potental, n s the plane normal of superpxel,.e., 3 components of y, N,j b s the number of pxels shared along the boundary between superpxel and superpxel j, and g,j s a weght based on the mage gradent at the boundary between superpxel and j,.e., g,j = exp( µ,j /σ), wth µ,j the mean gradent along the boundary between the two superpxels. To handle the occluson cases, the functon φ o,j (y, y j, e,j ) assgns a cost 0 f the two superpxels are n a confguraton that agrees wth the state of e,j,.e., left occluson or rght occluson, and a cost θ max otherwse. Whle ths potental depends on three varables, t remans fast to compute, snce e,j can only take four states. Potentals for dscrete varables: Fnally, we use two dfferent potentals that only nvolve dscrete varables. The frst one s a unary potental that makes use of a classfer traned to dscrmnate between occluson (.e., lo ro) and non-occluson (.e., so co) cases. To ths end, we utlze the mage-based occluson cues ntroduced n [] and employ a bnary boosted decson tree classfer. Gven the predcton of the classfer ê p, our potental functon takes the form { φ t θt f e p(e p ) = p agrees wth ê p (4) θ t otherwse, where θ t s a parameter of our model. Note that dstngushng between all four types of edge varables proved too unrelable, whch motvated our decson to only consder occluson vs. non-occluson. The second purely dscrete potental s smlar to the juncton feasblty potental used n [3] for stereo. More specfcally, t encodes nformaton about whether the juncton between three edge varables s physcally possble, or not. Therefore, ths potental takes the form { φ t θmax f mpossble case p,q,r(e p, e q, e r ) = (5) 0 otherwse. Here, we employed the same mpossble cases as n [3] for our 4 states, assumng that co typcally form a hnge, whle so are mostly coplanar. Note that, here, we only consdered junctons of three superpxels, snce junctons of four occur very rarely. However, 4-junctons could easly be ntroduced n our framework. 4. Expermental Evaluaton We now present our expermental results on depth estmaton n outdoor and ndoor scenes. In partcular, we evaluated our method on two publcly avalable datasets: the Make3D range mage dataset [25] and the NYU v2 Knect dataset [28]. For both datasets, we compare our results wth those of the depth transfer method of [3], whch represents the current state-of-the-art for depth estmaton from a sngle mage. In addton to the baselne [3], we also evaluate the results of our unary terms only and of our GP depth regressors on ther own, as well as the results of our model wthout dscrete varables and wth the same parwse term as the e,j = so case and of the frst approxmate MAP n our model obtaned before samplng partcles n PCBP. For our quanttatve evaluaton, we report errors obtaned wth the three followng commonly-used metrcs: g average relatve error (rel): u d u N u g u, average log 0 error: N u log 0g u log 0 d u, root mean squared error (rms): N u (g u d u ) 2, where g u s the ground-truth depth at pxel u, d u s the correspondng estmated depth, and N denotes the total number of pxels n all the mages.

usng a small valdaton set of 0 mages from

Method rel log 0 rms Unary 0.352 0.42 9.

model wthout dscrete edge type varables,

5 Our method Depth transfer [3] Ground-truth Image Fgure 2. Qualtatve comparson of the depths estmated wth depth transfer [3] and wth our method on the Make3D dataset. Color ndcates depth (red s far, blue s close). Method rel log 0 rms Depth transfer [3] C C Our method C C Table. Depth reconstructon errors on the Make3D dataset for depth transfer [3] and for our method evaluated on two crtera (C and C 2, see text for detals.) In both experments, we used SLIC [] to compute the superpxels. For each test mage, we retreved K = 7 canddates from the tranng mages. The parameters of our model were selected usng a small valdaton set of 0 mages from the NYU v2 dataset and kept the same n both experments. The specfc values were w r =, w m = 0, Method rel log 0 rms Unary GP regresson No dscrete varables No samplng Full model Table 2. Make3D: Comparson of our fnal results wth those obtaned wth unary terms only, wth our GP depth regressors only, usng a model wthout dscrete edge type varables, and after the frst round of PCBP where no samplng s nvolved. θ t = 0, and θ max = 20. Note that these parameters could, n prncple be learned. However, our approach proved robust enough for ths to be unnecessary. We performed two teratons of PCBP wth N s = 20 partcles at each teraton.

Ground-truth Regresson No samplng Samplng round Fnal depth Fgure 3. Make3D: Depth maps at dfferent stages of our approach. Method rel log 0 rms Depth transfer [3] 0.374 0.34.2 Our method 0.335 0.27.

Method rel log 0 rms Depth fuson (no warp) [6] 0.37 0.37.3 Depth fuson [5] 0.368 0.35.3 Depth transfer 0.350 0.3.2 Our method 0.327 0.26.08 Table 4.

. Outdoor Scene Reconstructon: Make3D The Make3D dataset contans 534 mages wth correspondng depth maps, parttoned nto 400 tranng mages and 34 test mages.

6 Ground-truth Regresson No samplng Samplng round Fnal depth Fgure 3. Make3D: Depth maps at dfferent stages of our approach. Method rel log 0 rms Depth transfer [3] Our method Table 3. Depth reconstructon errors on the NYU v2 dataset for depth transfer [3] and for our method usng the tranng/test partton provded wth the dataset. Method rel log 0 rms Depth fuson (no warp) [6] Depth fuson [5] Depth transfer Our method Table 4. Comparson of the depth estmaton errors on the NYU v2 dataset usng a leave-one-out strategy. 4.. Outdoor Scene Reconstructon: Make3D The Make3D dataset contans 534 mages wth correspondng depth maps, parttoned nto 400 tranng mages and 34 test mages. All the mages were reszed to pxels n order to preserve the aspect rato of the orgnal mages. Snce the true focal length of the camera s unknown, we assume a reasonable value of 500 for the reszed mages. Due to the lmted range and resoluton of the sensor used to collect the ground-truth, far away pxels, were arbtrarly set to depth 80 n the orgnal dataset. To take ths, as well as the effect of nterpolaton when reszng the mages, nto account n our evaluaton, we report errors based on two dfferent crtera: (C ) Errors are computed n the regons wth ground-truth depth less than 70; (C 2 ) Errors are computed n the entre mage. In ths second scenaro, to reduce the effect of meanngless canddates n sky regons, we used a classfer to label sky pxels and for the depth of the correspondng superpxels to take the value (0, 0,, 80). Note that the same two crtera (C and C 2 ) were used to evaluate the baselne. In Table, we compare the results of our approach wth those obtaned by depth transfer [3]. Note that, usng crtera C, we outperform the baselne n terms of relatve error and perform slghtly worse for the other metrcs. Usng crtera C 2, we outperform the baselne for all metrcs. Method rel log 0 rms Unary GP regresson No dscrete varables No samplng Full model Table 5. NYU v2: Comparson of our fnal results wth those obtaned wth unary terms only, wth our GP depth regressors only, usng a model wthout dscrete edge type varables, and after the frst round of PCBP where no samplng s nvolved. Fg. 2 provdes a qualtatve comparson of our depth maps wth those estmated by depth transfer [3] for some mages of the dataset. Note that depth transfer tends to oversmooth the depth maps and, e.g., merge foreground objects wth the background. Thanks to our dscrete varables, our approach better respects the dscontnutes n the scene. In Table 2, we show the results obtaned wth some of the parts of our model. Note that, even though the samplng n PCBP does not seem to have a great mpact on the errors, t helps smoothng the depth maps and thus makes them look more realstc. Ths s evdenced by Fg. 3, where we show the depth maps at dfferent stages of our approach. Note that the nfluence of each stage s more easly seen wth NYU v2 (see Fg. 5) for whch the overall depth range s smaller Indoor Scene Reconstructon: NYU v2 The NYU v2 dataset contans 449 mages, parttoned nto 795 tranng mages and 654 test mages. All the mages were reszed to pxels, whle smultaneously respectng the masks provded wth the dataset. In ths case, the ntrnsc camera parameters are gven wth the dataset. We evaluated the depth transfer code provded by [3] to obtan baselne results on the tranng/test provded wth the dataset and compare these results wth those obtaned wth our approach n Table 3. To be able to compare our results wth those reported n [4], we also appled our method n a leave-one-out manner on the full dataset. The results are reported n Table 4. Note that, n both cases, we outperform the baselnes for all metrcs. These error metrcs were computed over the vald pxels (non-zero depth) n the ground-

Our method Depth transfer [3] Ground-truth

Qualtatve comparson of the depths estmated wth

results wth those of [3] for some mages.

generated by depth transfer s even more obvous

In contrast, our approach stll yelds a realstc

Note that all the components contrbute to our

5 depcts the depth maps at dfferent stages of

Whle samplng smoothes the depth map, t stll

In addton to the estmated depth, our model can

Note that our model captures most of the

Concluson In ths paper, we have presented an

7 Our method Depth transfer [3] Ground-truth Image Fgure 4. Qualtatve comparson of the depths estmated wth depth transfer [3] and wth our method on the NYU v2 dataset. Ground-truth Regresson No samplng Samplng round Fnal depth Fgure 5. NYU v2: Depth maps at dfferent stages of our approach. truth depth maps. In Fg. 4, we provde a qualtatve comparson of our results wth those of [3] for some mages. Note that the oversmoothng of the depth maps generated by depth transfer s even more obvous n the short depth range scenaro. In contrast, our approach stll yelds a realstc representaton of the scene. In Table 5, we show the nfluence of the dfferent parts of our model. Note that all the components contrbute to our fnal results. Fg. 5 depcts the depth maps at dfferent stages of our approach. Whle samplng smoothes the depth map, t stll respects the mage dscontnutes. In addton to the estmated depth, our model can also predct the boundary type of the superpxel edges. In partcular, the occluson boundares are useful cues for spatal reasonng. We qualtatvely evaluate the occluson boundary predcton by showng typcal results n Fg. 6 for both ndoor and outdoor scenaros. Note that our model captures most of the occluson edges. 5. Concluson In ths paper, we have presented an approach to estmatng the depth of a scene from a sngle mage. To ths end, we have employed contnuous varables to represent the depth of mage superpxels, and dscrete ones to encode relatonshps between neghborng superpxels. As a result, we have formulated depth estmaton as nference n a hgher-order, dscrete-contnuous graphcal model, whch we have performed usng partcle belef propagaton. Our experments

8 Fgure 6. Estmated boundary occluson map. The top row shows the nput mage and the bottom row shows the estmated boundary occluson map. The superpxel boundares are drawn n blue. Pxels n magenta denote the estmated occluson boundares. have shown that ths model let us effectvely reconstruct general scenes from stll mages n both the ndoor and outdoor scenaros. In the future, we ntend to study how ths model can be exploted n 3D scene understandng by, e.g., jontly performng semantc labelng and depth estmaton. References [] R. Achanta, A. Shaj, K. Smth, A. Lucch, P. Fua, and S. Suesstrunk. Slc superpxels compared to state-of-the-art superpxel methods. PAMI, [2] O. M. Aodha, N. D. Campbell, A. Nar, and G. Brostow. Patch based synthess for sngle depth mage superresoluton. In ECCV, [3] N. Brkbeck, D. Cobzas, and M. Jaegersand. Depth and scene flow from a sngle movng camera. In 3DPVT, [4] N. Brkbeck, D. Cobzas, and M. Jaegersand. Bass constraned 3d scene flow on a dynamc proxy. In ICCV, [5] D. Crandall, A. Owens, N. Snavely, and D. Huttenlocher. Dscrete-contnuous optmzaton for large-scale structure from moton. In CVPR, [6] A. Geger, C. Wojek, and R. Urtasun. Jont 3d estmaton of objects and scene layout. In NIPS, [7] A. Gupta, A. Efros, and M. Hebert. Blocks world revsted: Image understandng usng qualtatve geometry and mechancs. In ECCV, 200., 2 [8] C. Haene, C. Zach, A. Cohen, R. Angst, and M. Pollefeys. Jont 3d scene reconstructon and class segmentaton. In CVPR, [9] V. Hedau, D. Hoem, and D. Forsyth. Thnkng nsde the box: Usng appearance models and context based on room geometry. In ECCV, 200., 2 [0] D. Hoem, A. Efros, and M. Hebert. Geometrc context from a sngle mage. ICCV, [] D. Hoem, A. Sten, A. Efros, and M. Hebert. Recoverng occluson boundares from a sngle mage. In ICCV, , 4 [2] A. Ihler and D. McAllester. Partcle belef propagaton. In AISTATS, [3] K. Karsch, C. Lu, and S. B. Kang. Depth extracton from vdeo usng non-parametrc samplng. In ECCV, 202., 2, 3, 4, 5, 6, 7 [4] K. Karsch, C. Lu, and S. B. Kang. Depthtransfer: Depth extracton from vdeo usng non-parametrc samplng. PAMI, [5] J. Konrad, G.Brown, M. Wang, P. Ishwar, C. Wu, and D. Mukherjee. Automatc 2d-to-3d mage converson usng 3d examples from the nternet. In SPIE Stereoscopc Dsplays and Applcatons, 202., 2, 3, 6 [6] J. Konrad, M. Wang, and P. Ishwar. 2d-to-3d mage converson by learnng depth from examples. In 3DCINE, 202., 2, 6 [7] D. C. Lee, A. Gupta, M. Hebert, and T. Kanade. Estmatng spatal layout of rooms usng volumetrc reasonng about object and surfaces. In NIPS, 200., 2 [8] B. Lu, S. Gould, and D. Koller. Sngle mage septh estmaton from predcted semantc labels. In CVPR, 200., 2 [9] R. A. Newcombe, S. Lovegrove, and A. J. Davson. Dtam: Dense trackng and mappng n real-tme. In ICCV, [20] J. Peng, T. Hazan, D. McAllester, and R. Urtasun. Convex max-product algorthms for contnuous mrfs wth applcatons to proten foldng. In ICML, 20. 2, 3 [2] C. E. Rasmussen and C. K. Wllams. Gaussan Process for Machne Learnng. MIT Press, [22] B. Russell and A. Torralba. Buldng a database of 3d scenes from user annotatons. In CVPR, 2009., 2 [23] R. F. Salas-Moreno, R. A. Newcombe, H. Strasdat, P. H. J. Kelly, and A. J. Davson. Slam++: Smultaneous localsaton and mappng at the level of objects. In CVPR, [24] A. Saxena, S. H. Chung, and A. Y. Ng. 3-d depth reconstructon from a sngle stll mage. IJCV, 2007., 2 [25] A. Saxena, M. Sun, and A. Y. Ng. Make3d: Learnng 3d scene structure from a sngle stll mage. PAMI, 2009., 2, 4 [26] A. G. Schwng, T. Hazan, M. Pollefeys, and R. Urtasun. Dstrbuted message passng for large scale graphcal models. In CVPR, [27] A. G. Schwng and R. Urtasun. Effcent exact nference for 3d ndoor scene understandng. In ECCV, 202., 2 [28] N. Slberman, D. Hoem, P. Kohl, and R. Fergus. Indoor segmentaton and support nference from rgbd mages. In ECCV, , 4 [29] C. Vogel, K. Schndler, and S. Roth. 3d scene flow estmaton wth a rgd moton pror. In ICCV, [30] C. Wu, J. M. Frahm, and M. Pollefeys. Repetton-based dense sngle-vew reconstructon. In CVPR, 20., 2 [3] K. Yamaguch, T. Hazan, D. McAllester, and R. Urtasun. Contnuous markov random felds for robust stereo estmaton. In ECCV, 202., 4

Learning the Kernel Parameters in Kernel Minimum Distance Classifier

Learning the Kernel Parameters in Kernel Minimum Distance Classifier Learnng the Kernel Parameters n Kernel Mnmum Dstance Classfer Daoqang Zhang 1,, Songcan Chen and Zh-Hua Zhou 1* 1 Natonal Laboratory for Novel Software Technology Nanjng Unversty, Nanjng 193, Chna Department