WHILE estimating the depth of a scene from a single image

Size: px
Start display at page:

Download "WHILE estimating the depth of a scene from a single image"

Transcription

1 JOURNAL OF L A T E X CLASS FILES, VOL. 4, NO. 8, AUGUST 05 Monocuar Depth Estimation using Muti-Scae Continuous CRFs as Sequentia Deep Networks Dan Xu, Student Member, IEEE, Eisa Ricci, Member, IEEE, Wani Ouyang, Senior Member, IEEE, Xiaogang Wang, Senior Member, IEEE, Nicu Sebe, Senior Member, IEEE arxiv: v [cs.cv] Mar 08 Abstract Depth cues have been proved very usefu in various computer vision and robotic tasks. This paper addresses the probem of monocuar depth estimation from a singe sti image. Inspired by the effectiveness of recent works on muti-scae convoutiona neura networks (CNN), we propose a deep mode which fuses compementary information derived from mutipe CNN side outputs. Different from previous methods using concatenation or weighted average schemes, the integration is obtained by means of continuous Conditiona Random Fieds (CRFs). In particuar, we propose two different variations, one based on a cascade of mutipe CRFs, the other on a unified graphica mode. By designing a nove CNN impementation of mean-fied updates for continuous CRFs, we show that both proposed modes can be regarded as sequentia deep networks and that training can be performed end-to-end. Through an extensive experimenta evauation, we demonstrate the effectiveness of the proposed approach and estabish new state of the art resuts for the monocuar depth estimation task on three pubicy avaiabe datasets, i.e. NYUD-V, Make3D and KITTI. Index Terms Monocuar Depth Estimation, Convoutiona Neura Networks (CNN), Deep Muti-Scae Fusion, Conditiona Random Fieds (CRFs). INTRODUCTION WHILE estimating the depth of a scene from a singe image is a natura abiity for humans, devising computationa modes for accuratey predicting depth information from RGB data is a chaenging task. Many attempts have been made to address this probem in the past. In particuar, recent works have achieved remarkabe performance thanks to powerfu deep earning modes [], [], [30], [36]. Assuming the avaiabiity of a arge training set of RGB-depth pairs, monocuar depth prediction from singe images can be regarded as a pixe-eve continuous regression probem and Convoutiona Neura Network (CNN) architectures are typicay empoyed. In the ast few years significant efforts have been made in the research community to improve the performance of CNN modes for pixe-eve prediction tasks (e.g. semantic segmentation, contour detection). Previous works have shown that, for depth estimation as we as for other pixeeve cassification or regression probems, more accurate estimates can be obtained by combining information from mutipe scaes [9], [], [46], [48]. This can be achieved in different ways, e.g. fusing feature maps corresponding to different network ayers or designing an architecture with mutipe inputs corresponding to images at different resoutions. Other works have demonstrated that, by adding a Conditiona Random Fied (CRF) in cascade to Dan Xu, Nicu Sebe are with the Department of Information Engineering and Computer Science, University of Trento, Trento, Itay. E-mai: {dan.xu, nicuae.sebe}@unitn.it Eisa Ricci is with Fondazione Bruno Kesser. Emai: eiricci@fbk.eu Wani Ouyang is with the Schoo of Eectrica and Information Engineering, The University of Sydney. Emai: wani.ouyang@sydney.edu.au Xiaogang Wang is with the Department of Eectronic Engineering, The Chinese University of Hong Kong. Emai: xgwang@ee.cuhk.edu.hk Manuscript received Apri 9, 005; revised August 6, 05. )- ( Fig.. Monocuar depth estimation resuts on three different benchmark datasets, i.e. NYUD-V (the st row), Make3D (the nd row) and Kitti (the 3rd row), using the proposed muti-scae CRF mode with a pretrained CNN (e.g. VGG Convoution-Deconvoution [34]). From eft to right, each coumn is origina RGB images, the recovered depth maps and the groundtruth, respectivey. a convoutiona neura architecture, the performance can be greaty enhanced and the CRF can be fuy integrated within the deep mode enabing end-to-end training with back-propagation [5]. However, these works mainy focus on pixe-eve prediction probems in the discrete domain (e.g. semantic segmentation). Whie compementary, so far these strategies have been ony considered in isoation and no previous works have expoited muti-scae information within a CRF inference framework. In this paper we argue that, benefiting from the fexibiity

2 JOURNAL OF L A T E X CLASS FILES, VOL. 4, NO. 8, AUGUST 05 and the representationa power of graphica modes, we can optimay fuse representations derived from mutipe CNN side-output ayers using structured constraints, improving performance over traditiona muti-scae strategies. By expoiting this idea, we introduce a nove framework to estimate depth maps from singe sti images. Opposite to previous work fusing muti-scae features by weighted averaging or concatenation, we propose to integrate muti-ayer side-output information by designing a nove approach based on continuous CRFs. Specificay, we present two different methods. The first approach is based on a singe muti-scae unified CRF mode, whie the other considers a cascade of scae-specific CRFs. We aso show that, by introducing a common CNN impementation for mean-fieds updates in continuous CRFs, both modes are equivaent to sequentia deep networks and an end-to-end approach can be devised for training. Through extensive experimenta evauation we demonstrate that the proposed CRF-based approach produces more accurate depth maps than traditiona muti-scae approaches for pixe-eve prediction tasks [6], [46]. Moreover, by performing experiments on the pubicy avaiabe NYU Depth V [43], Make3D [4] and KITTI [4] datasets, we show that our approach is abe to robusty reconstruct depth with good visua quaity (Fig.) and outperforms state of the art methods for the monocuar depth estimation task. This paper extends our earier work [50] through proposing and investigating different muti-scae connection structures for message passing, further enriching the reated works, providing more approach detais, and significanty expanding experimenta resuts and anaysis. To summarize, the contribution of this paper is threefod: Firsty, we propose a nove approach for predicting depth maps from RGB inputs which expoits muti-scae estimations derived from CNN inner semantic ayers by structuray fusing them within a unified CNN-CRF framework. Secondy, as the task of pixe-eve depth prediction impies inferring a set of continuous vaues, we show how mean fied (MF) updates can be impemented as sequentia deep modes, enabing end-to-end training of the whoe network. We beieve that our MF impementation wi be usefu not ony to researchers working on depth prediction, but aso to those interested in other probems invoving continuous variabes. Therefore, our code is made pubicy avaiabe at Thirdy, our experiments demonstrate that the proposed muti-scae CRF framework is superior to previous methods integrating information from different semantic network ayers by combining mutipe osses [46] or by adopting feature concatenations [6]. We aso show that our approach outperforms state of the state of the art monocuar depth estimation methods on pubic benchmarks and that the proposed CRF-based modes can be empoyed in combination with different pre-trained CNN architectures, consistenty enhancing their performance. The remainder of this paper is organised as foows. We first introduce reated work in Section, and then the proposed muti-scae CRF modes for monocuar depth estimation is presented in Section 3. We further eaborate how the proposed modes can be impemented as sequentia neura network for end-to-end joint optimization in Section 4. The experimenta resuts and anaysis are eaborated in Section 5, and we concude the paper in Section 6. RELATED WORK Our approach is buit upon recent successes of deep CNN architectures for image cassification [7], [3], [44] and fuy convoutiona networks for dense semantic image segmentation [33], [34]. We briefy introduce the most reated works by organizing them into three main aspects, i.e. monocuar depth estimation, muti-scae CNN and dense pixe-eve prediction via combination of CNN and CRFs. Monocuar depth estimation. Previous approaches for depth estimation from singe images can be grouped into three main categories: (i) methods operating on hand crafted features, (ii) methods based on graphica modes and (iii) methods adopting deep convoutiona neura networks. Earier works addressing the depth prediction task beong to the first category. Hoiem et a. [8], [9] proposed photo pop-up, a fuy automatic method for creating a basic 3D mode from a singe photograph by introducing an assumption of ground-vertica geometric structure. Karsch et a. [0] deveoped Depth Transfer, a non parametric approach based on SIFT Fow, where the depth of an input image is reconstructed by transferring the depth of mutipe simiar images and then appying some warping and optimizing procedures. Instead of directy recovering depth from appearance features, Liu et a. [9] expored using semantic scene segmentation resuts to guide the 3-D depth reconstruction. Simiary, Ladicky et a. [5] aso demonstrated the benefit of combining semantic object abes with depth features. However, the hand-crafted representations are not robust enough for this chaenging probem. In the second category, some works expoited the fexibiity of graphica modes to reconstruct depth information. For instance, Deage et a. [0] proposed a dynamic Bayesian framework for recovering 3D information from indoor scenes. A discriminativey-trained mutiscae Markov Random Fieds (MRFs) were introduced in [39], [40], in order to optimay fuse oca and goba features. Depth estimation was treated as an inference probem in a discretecontinuous CRF mode in [3]. However, these works did not empoy deep networks. More recent approaches for depth estimation are based on CNNs [], [7], [30], [38], [45]. For instance, Eigen et a. [] proposed a muti-scae approach for depth prediction, considering two deep networks, one performing a coarse goba prediction based on the entire image, and the other refining predictions ocay. This approach was extended in [] to hande mutipe tasks (e.g. semantic segmentation, surface norma estimation). Wang et a. [45] introduced a CNN for joint depth estimation and semantic segmentation. The obtained estimates were further refined with Hierarchica CRFs. The most simiar work to ours is [30], where the representationa power of deep CNN and continuous CRFs is jointy expoited for depth prediction. However, the method proposed in [30] is based on superpixes and the information associated to mutipe scaes is not expoited in their graphica mode.

3 JOURNAL OF L A T E X CLASS FILES, VOL. 4, NO. 8, AUGUST 05 3 Front-End Convoutiona Neura Network r s? d s s 3 s 4 s 5 Side Outputs C-MF C-MF C-MF C-MF C-MF r... C-MF C-MF C-MF C-MF C-MF Muti-Scae Fusion with Continuous CRFs Fig.. Overview of the proposed deep architecture. Our mode is composed of two main components: a front-end CNN and a fusion modue. The fusion modue uses continuous CRFs to integrate mutipe side output maps of the front-end CNN. We consider two different CRFs-based muti-scae modes and impement them as sequentia deep networks by stacking severa eementary bocks, the C-MF bocks. s s s 3 s 4 s 5 s s s 3 s 4 s 5 s s s 3 s 4 s 5 s s s 3 s 4 s Bottom up structure Top down structure (c) Skip connection structure (d) A to one structure Fig. 3. Iustration of different muti-scae message passing structures for the integration of the muti-scae predictions s to s 5 produced from the front-end convoutiona network. The arrows represent the direction of the message passing, and the numbers in circes represent the order. The dashed ine box in Fig. shows a bottom-up passing structure. Muti-Scae CNNs. The probem of combining information from mutipe scaes has recenty received considerabe interest in various computer vision tasks. In [46] a deepy supervised fuy convoutiona neura network was proposed for edge detection by weighted combination of mutipe side outputs. Skip-ayer networks, where the feature maps derived from different semantic ayers of a primary frontend network are jointy considered in an output ayer, have aso become very popuar [3], [6], [33]. Other works considered muti-stream architectures, where mutipe parae networks receiving inputs at different scae are fused [4]. Cai et a. [5] proposed a muti-scae method via combining the predictions obtained from feature maps with different resoution for object detection. Diated convoutions (e.g. diation or à trous) have been aso empoyed in different deep network modes in order to aggregate muti-scae contextua information [7]. However, in these works, the muti-scae representations or estimations are typicay combined by using simpe concatenation or weighted averaging operation. We are not aware of previous works exporing fusing deep muti-scae information within a CRF framework. Dense pixe-eve prediction via combination of CNN and CRFs. The combination of CNN and CRFs has shown great usefuness for dense pixe-eve structured prediction [], [4]. Some existing works utiize CRFs as a post processing modue for further refining the predictions from the CNN [8], [35]. To benefit from end-to-end earning, Zhang et a. [5] proposed a CRF-RNN mode which jointy optimizes a front-end deep network with a discrete CRF for semantic image segmentation. Xu et a. [47] proposed an attention-gated deep CRF framework for pixe-eve contour prediction. However, as far as we know, this work is a first attempt to combine muti-scae continuous CRFs with deep convoutiona neura network for constructing a unified mode for end-to-end monocuar depth estimation. 3 MULTI-SCALE CRF MODELS FOR MONOCULAR DEPTH ESTIMATION In this section we introduce our deep mode with the designed muti-scae continuous CRFs for monocuar depth estimation from RGB images. We first formaize the probem of depth prediction and give a brief overview of the proposed approach. Then, we describe two different variants of the proposed muti-scae mode, one based on a cascade of CRFs and the other on a singe muti-scae unified CRFs. 3. Probem Formuation and Overview Foowing previous works we formuate the task of depth prediction from monocuar RGB input as the probem of earning a non-inear mapping F : I D from the image space I to the output depth space D. More formay, et Q = {(r i, d i )} Q i= be a training set of Q pairs, where r i I

4 JOURNAL OF L A T E X CLASS FILES, VOL. 4, NO. 8, AUGUST 05 4 denotes an input RGB image with N pixes and d i D represents its corresponding rea-vaued depth map. For earning F we consider a deep mode made of two main buiding bocks (Fig. ). The first component is a CNN architecture with a set of intermediate side outputs S = {s } L =, s R N, produced from L different ayers with a mapping function f s (r; Θ, θ ) s. For simpicity, we denote with Θ the set of front-end network ayer parameters and with θ the parameters of the network branch producing the side output associated to the -th ayer (see Section 5. for detais of our impementation). In the foowing we denote this network as the front-end CNN. The second component of our mode is a fusion bock. As shown in previous works [3], [33], [46], features generated from different CNN ayers capture compementary information. The main idea behind the proposed fusion bock is to use CRFs to effectivey integrate the side output maps of our front-end CNN for robust depth prediction. Our approach deveops from the intuition that these representations can be combined within a sequentia framework, i.e. performing depth estimation at a certain scae and then refining the obtained estimates in the subsequent eve. Specificay, we introduce and compare two different muti-scae modes, both based on CRFs, and corresponding to two different versions of the fusion bock. The first mode is based on a singe muti-scae unified CRFs, which integrates information avaiabe from different scaes and simutaneousy enforces smoothness constraints between the estimated depth vaues of neighboring pixes and neighboring scaes. The second mode impements a cascade of scae-specific CRFs: at each scae a CRF is empoyed to recover the depth information from side output maps s and the outputs of each CRF mode are used as additiona observations for the subsequent mode. In Section 3.. we describe the two modes in detais, whie in Section 4 we show how they can be impemented as sequentia deep networks by stacking severa eementary bocks. We ca these bocks C-MF bocks, as they impement Mean Fied updates for Continuous CRFs (Fig. ). 3. Muti-scae Fusion with Continuous CRFs We now eaborate the proposed CRF-based modes for fusing muti-scae side-outputs derived from different semantic ayers of the front-end deep convoutiona neura networks. 3.. Muti-Scae Unified CRF Mode Given a vector ŝ with a dimension of L N obtained by concatenating the side output score maps {s,..., s L } and a vector d with a dimension of L N expressing rea-vaued output variabes, we define a CRF modeing the foowing conditiona distribution: P (d ŝ) = exp{ E(d, ŝ)}, () Z(ŝ) where Z(ŝ) = d exp{ E(d, ŝ)}dd is the partition function [6] acting as a normaization factor for probabiities. The energy function is defined as: N L E(d, ŝ) = ψ(d i, d k j ), () i= = φ(d i, ŝ) + i,j,k and d i indicates the hidden variabe associated to scae and pixe i. The first term is the sum of quadratic unary terms defined as: φ(d i, ŝ) = ( d i s, i) (3) where s i is the regressed depth vaue at pixe i and scae obtained with f s (r; Θ, θ ). The second term is the sum of pairwise potentias describing the reationship between pairs of hidden variabes d i and dk j and is defined as foows: M ψ(d i, d k j ) = β m w m (i, j,, k, r)(d i d k j ), (4) m= where w m (i, j,, k, r) is a weight which specifies the reationship between the estimated depth of the pixes i and j at scae and k, respectivey; M is the number of kernes. To perform inference we rey on the mean-fied theory to approximate P (d ŝ) with another distribution Q(d ŝ), where Q(d ŝ) = N L i= = Q i,(d i ŝ), expressing a product of independent marginas. By minimizing the Kuback- Leiber divergence between the distribution of P and Q, we obtain the soution of Q. As the og distribution og Q i, (d i ŝ) has a quadratic form w.r.t. d i and can be represented as Gaussian distribution, the foowing meanfied updates can be derived: γ i, = ( M + β m w m (i, j,, k, r) ), (5) µ i, = γ i, ( s i + m= k j,i M ) β m w m (i, j,, k, r)µ j,k. (6) m= k Here γ i, and µ i, are the variance and mean of the distribution Q i,, respectivey. To define the weights w m (i, j,, k, r) we introduce the foowing assumptions. First, we assume that the estimated depth at scae ony depends on the depth estimated at previous scae. Second, for reating pixes at the same and at previous scae, we set weights depending on m kerne functions Km ij, which consists of Gaussian kernes with form of exp ( ) hm i hm j θ. Here, h m m i and h m j indicate some features derived from the input image r for pixes i and j. θ m are user-defined bandwidth parameters []. Foowing previous works [], [5], we use pixe positions and coor vaues as features, eading to two kerne functions, i.e. a biatera appearance kerne using both the pixe positions and the coor vaue features and a spatia smoothness kerne using ony the pixe positions features, for modeing dependencies of pixes at scae and other two for reating pixes at neighboring scaes. Under these assumptions, the meanfied updates (5) and (6) can be rewritten as: γ i, = ( 4 + β m Km ij + β m Km) ij, (7) m= j i µ i, = γ i, ( s i + + j,i m=3 j,i β m Kmµ ij j,, m= j i (8) 4 β m Kmµ ij ) j,. m=3 The parameters β m need to be earned during training. We wi present the detais of the parameter optimization in j,i

5 JOURNAL OF L A T E X CLASS FILES, VOL. 4, NO. 8, AUGUST 05 5 µ t Biatera Fitering µ t, = K µ t µ t µ t, ( J) Biatera Fitering = K J Input Data Bobs µ t G Spatia Fitering µ t, = K µ t µ t Biatera Fitering µ t, = K 3 µ t Spatia Fitering µ t, = K 4 µ t µ t, 3 µ t, µ t, ( J) Spatia Fitering = K J Biatera Fitering 3 = K 3 J Spatia Fitering 4 = K 4 J G Output Data Bobs µ t S Adding Unary Term µ t = S µ t Normaizing µ t = µ t Adding Constant = J µ t Fig. 4. Detaied computing fow graph of the proposed C-MF bock. J represents a W H matrix with a eements equa to one. The symbos,, and indicate eement-wise addition, subtraction, division and Gaussian convoution operation, respectivey. G and G represent two gate functions for controing the computing fow. Section 4. Given a new test image, the optima d can be computed via maximizing the og conditiona probabiity [37], i.e. d = arg max d og(q(d S)), where d = [µ,,..., µ N,L ] is a vector of the L N mean vaues associated to Q(d ŝ). We take the estimated variabes at the finest scae L (i.e. µ,l,..., µ N,L ) as our predicted depth map d. 3.. Muti-Scae Cascade CRF Mode The cascade mode is based on a set of L CRF modes, each one associated to a specific scae, which are progressivey stacked such that the estimated depth at previous scae can be used as observations of the CRF mode in the foowing scae eve. Each CRF is used to compute the output vector d and it is constructed considering the side output representations s and the estimated depth at the previous step d as observed variabes, i.e. o = [s, d ]. The associated energy function of the CRF mode is defined as: N E(d, o ) = ψ(d i, d j). (9) φ(d i, o ) + i= i j The unary and pairwise terms can be defined anaogousy to the above-introduced unified muti-scae mode. In particuar the unary term, refecting the simiarity between the observation o i and the hidden depth vaue d i, is: φ(y i, o ) = ( d i o i), (0) where o i is obtained via combining the regressed depth from the side output s and the map d estimated by the CRF at previous scae. In our impementation we simpy consider o i = s i + d i, but other aternative strategies can be aso considered. The pairwise potentias, used to force neighboring pixes with simiar appearance to have cose depth vaues, are: M ψ(d i, d j) = β m Km(d ij i d j), () m= where we consider M = Gaussian kernes, one for appearance features, and the other accounting for pixe positions. Simiar to the muti-scae CRF mode, under mean-fied approximation, the foowing updates can be derived: γ i, = ( M + β m Km) ij, () µ i, = γ i, ( o i + m= j i M β m Kmµ ij ) j,. (3) m= At the test time, we use the estimated depth variabes corresponding to the cascade CRF mode of the finest scae L as our fina predicted depth map d. j i 4 MULTI-SCALE MODELS AS SEQUENTIAL DEEP NETWORKS In this section, we describe how the two proposed CRFsbased modes can be impemented as sequentia deep networks, enabing end-to-end training of our whoe deep network mode (the front-end CNN and the fusion modue). We first show how the mean-fied iterations derived for the muti-scae and the cascade modes can be impemented by designing a common structure, the continuous mean-fied updating (C-MF) bock, consisting into stack of a series of CNN operations. Then, we present the resuting sequentia network structures and detais of the training phase for optimizing the whoe deep network. 4. C-MF: a Common CNN Impementation of Continuous Mean-Fied Updating By anayzing the two proposed CRF modes, we can observe that the mean-fied updates derived for the cascade and for the muti-scae modes share common terms. As stated above, the main difference between the two is the way the estimated depth at previous scae is handed at the current scae. In the muti-scae CRFs, the reationship among neighboring scaes is modeed in the hidden variabe space, whie

6 JOURNAL OF L A T E X CLASS FILES, VOL. 4, NO. 8, AUGUST 05 6 CCRF_ µ t, ReLU µ 0 µ 0, S CNN at scae CCRF_, µ t Outputd? µ S µ µ, CNN at scae, µ 0,,, CCRF_ µ µ t ReLU ReLU, µ µ, µ 0 µ 0 S CNN at scae (a) The proposed muti-scae cascade CRF mode as sequentia neura network using the C-MF bock. µ T 3, 4 µ T 3, 4 µ T Outputd?,,, µ 3, 4 µ 3, 4 µ µ, 3, 4 µ, 3, 4 µ, S,,, µ 0 S µ 0 S µ 0 CNN at scae CNN at scae CNN at scae (b) The proposed muti-scae unified CRF mode as sequentia neura network using the C-MF bock. Fig. 5. Description of the proposed two CRF modes as sequentia deep networks. The bue and yeow boxes indicate the estimated variabes and observations, respectivey. The parameters β m are used for mean-fied updates. As in the cascade mode parameters are not shared among different CRFs, we use the notation β, β to denote parameters associated to the -th scae. in the cascade CRFs the depth estimated at previous scae acts as an observed variabe. Starting from this observation, in this section we show how the computation of Eq. (8) and Eq. (3) can be impemented with a common structure. Figure 4 describes in detais these computations. In the foowing, for the sake of carity, we introduce matrix representation. Let S R W H be the matrix obtained by rearranging the N = W H pixes corresponding to the side output vector s and µ t H RW the matrix of the estimated output depth variabes associated to scae and mean-fied iteration t. To impement the muti-scae mode at each iteration t, µ t and µ t are convoved by two Gaussian kernes. Foowing [], we use a spatia and a biatera kerne. As Gaussian convoutions represent the computationa botteneck (requiring a compexity of O(N )) in the mean-fied iterations, we adopt the permutohedra attice impementation [] to approximate the fiter response cacuation reducing the computationa cost from quadratic to inear [37]. The weighing of the parameters β m is performed as a convoution with a kerne. Then, the outputs are combined and are added to the side-output maps S. Finay, a normaization step foows, corresponding to the cacuation of Eq. (7). The normaization matrix γ R W H is aso computed by considering convoutions with Gaussian kernes and weighting with parameters β m. It is worth noting that the normaization step in our mean-fied updates for continuous CRFs is substantiay different from that of discrete CRFs in CRF- RNN [5] based on a softmax function. In the cascade CRF mode, differenty from the mutiscae unified CRF mode, µ t acts as an observed variabe. To design a common C-MF bock among the two modes, we introduce two gate functions G and G (Fig. 4) controing the computing fow and aowing to easiy switch between the two approaches. Both gate functions accept a userdefined booean parameter. In our setting, the vaue corresponds to the muti-scae CRF and the vaue 0 corresponds to the cascade mode. Specificay, if G is equa to, the gate function G passes µ t to the Gaussian fitering bock, otherwise passes it to the eement-wise addition bock with the computed message. Simiary, G contros the computation of the normaization terms and switches between the computation of Eq. (7) and Eq. (). In other words, if G equas to 0, then the Gaussian fitering and weighting

7 JOURNAL OF L A T E X CLASS FILES, VOL. 4, NO. 8, AUGUST 05 7 operations for γ 3 and γ 4 are disabed. Importanty, for each step in the C-MF bock we impement the cacuation of error differentias for the back-propogation as in [5]. There are two different types of CRF parameters to be earned, i.e. the bandwidth parameters θ m and the Gaussiankerne weights β m. For optimizing these CRF parameters, simiar to [], the bandwidth vaues θ m are pre-defined for simpifying the cacuation, and we impement the backward differentia computation for the weights of Gaussian kernes β m. In this way β m are earned automaticay with back-propagation. 4. From Mean-Fied Updates to Sequentia Deep Networks Fig. 4 iustrates the impementation of the proposed two CRF-based modes using the designed C-MF bock described above. In the figure, each bue-dashed box is associated to a mean-fied iteration. The cascade mode as shown in Fig. 5(b) consists of L singe-scae CRFs. At the -th scae, t mean-fied iterations are performed and then the estimated depth outputs are passed to another CRF mode of the subsequent scae after a Rectified Linear Unit (ReLU) operation. The ReLU used here has two aspects of consideration: first the depth predictions shoud be aways positive, and second we want to increase the noninearity of the sequentia network for better mapping. To impement a singe-scae CRF, we stack t C-MF bocks and make them share the parameters, whie we earn different parameters for different CRFs. For the muti-scae mode, one fu meanfied update invoves L scaes simutaneousy, obtained by combining L C-MF bocks. We further stack T iterations for earning and inference. The parameters corresponding to different scaes and different mean-fied iterations are shared. In this way, by using the common C-MF ayer, we impement the two proposed muti-scae continuous CRFs modes as deep sequentia networks enabing end-to-end training with the front-end network. 4.3 Muti-Scae Message Passing Structures The proposed work aims at muti-scae structured fusion and prediction, the connection structure between the different muti-scae predictions for message passing pays an important roe in the performance. In this section, we thus propose and investigate different message passing structures. Fig. 3 iustrates severa structures incude top down structure, skip-connection structure and a to one structure. The top down structure is simiar to the bottom up structure depicted in Fig., which graduay refines the score maps from coarse to fine. The skip connection structure aims at utiizing more compementary information via skipping scaes. The a to one structure uses a the other scaes to refine the finest scae. Since a the message passing structures invove two scaes at each time, we are abe to buid a these proposed connection structures by using the proposed aforementioned neura-network impemented C- MF bock. The experimenta investigation of these structures is iustrated in the experimenta part. TABLE The parameter detais of the sub-network for generating the side output from the ast-scae convoutiona bock of ResNet-50. Name conv s5 deconv s5 deconv s5 Type conv deconv deconv Kerne Stride, Padding,,, Activation ReLU ReLU ReLU Name deconv s5 3 deconv s5 4 pred Type deconv deconv deconv & crop Kerne Stride, Padding,,, Activation ReLU ReLU Optimization of The Whoe Network We train the whoe network using a two phase scheme. In the first phase (pretraining), the parameters of the base front-end network Θ and the parameters of the side-output generation sub-branch networks ϑ = {θ } L = are earned by minimizing the sum of L distinct side osses as in [46], corresponding to L side outputs. We define the optimization objective using a square oss over Q training sampes as foows: {Θ, ϑ } = arg min Θ,θ L = i= Q f s (r i ; Θ, θ ) d i, (4) where d i denotes the i-th ground-truth sampe. In the second phase (fine tuning), we initiaize the front-end network with the earned parameters {Θ, ϑ } in the first phase, and jointy fine-tune with the proposed muti-scae CRF modes to compute the optima vaue of the parameters Θ, ϑ and β, with β = {β m } M m=. The entire network is earned with Stochastic Gradient Descent (SGD) by minimizing a square oss Q {Θ, ϑ, β } = arg min F (r i ; Θ, ϑ, β) d i. (5) Θ,ϑ,β i= When the whoe network optimization is finished, the test can be performed end-to-end, i.e. given a test RGB image as input the network directy outputs an estimated depth map. 5 EXPERIMENTS To demonstrate the effectiveness of the proposed muti-scae CRF modes for monocuar depth prediction, we performed experiments on three pubicy avaiabe datasets: the NYU Depth V [43], the Make3D [39] and the KITTI [4] datasets. In the foowing we first describe the experimenta setup and the impementation detais, and then present the experimenta resuts and anaysis. 5. Experimenta Setup 5.. Datasets The NYU Depth V dataset [43] contains 0K unique pairs of RGB and depth images captured with a Microsoft Kinect. The datasets consists of 49 scenes for training and 5 scenes for testing. The images have a resoution of To speed up the training phase, foowing previous works [30], [53] we consider ony a sma subset of images. This subset has 449 aigned RGB-depth pairs: 795 pairs are used for training, 654 for testing. Foowing [], we perform data augmentation for the training

8 JOURNAL OF L A T E X CLASS FILES, VOL. 4, NO. 8, AUGUST 05 8 Fig. 6. Exampes of quaitative depth prediction resuts of different methods on the NYU v test dataset. Different front-end deep network architectures are investigated. VGG-CD-MSCRF and ResNet-MSCRF represent our approach with the proposed muti-scae continuous CRF mode pugged on VGG-CD and ResNet-50 network respectivey. sampes. The RGB and depth images are scaed with a ratio ρ {,.,.5} and the depths are divided by ρ. Additionay, we horizontay fip a the sampes and randomy crop them to pixes. The data augmentation phase produces 4770 training pairs in tota. The Make3D dataset [39] contains 534 RGB-depth pairs, spit into 400 pairs for training and 34 for testing. We resize a the images to a resoution of as done in [3] to preserve the aspect ratio of the origina images. We adopted the same data augmentation scheme used for NYU Depth V dataset but, for ρ = {.,.5} we randomy generate two sampes each via cropping, obtaining 4K training sampes. The KITTI dataset [4] is buit for various computer vision tasks within the context of autonomous driving, which contains depth videos captured through a LiDAR sensor depoyed on a driving vehice. For the training and testing spit, we foow the protoco made by Eigen et a. [] for a better comparison with existing works. Specificay, 6 scenes are seected from the raw data. Tota,600 images from 3 scenes are used for training, and 697 images from the other 9 scenes are used for testing. Foowing [3], the ground-truth depth maps are generated by reprojecting the 3D points coected from veodyne aser into the eft monocuar camera. The resoution of RGB images are reduced haf from origina for training and testing. 5.. Evauation Metrics Foowing previous works [], [], [45], we adopt the foowing evauation metrics to quantitativey assess the performance of our depth prediction mode. Specificay, we consider: mean reative error (re): root mean squared error (rms): P d i d i P i= d i ; P P i= ( d i d i ) ; mean og0 error (og0): P P i= og 0( d i ) og 0 (d i ) ; scae invariant rms og error as used in [], rms(scinv.); accuracy with threshod t: percentage (%) of d i, subject to max( d i ) = δ < t (t [.5,.5,.5 3 ]). d, d i i d i Where d i and d i is the ground-truth depth and the estimated depth at pixe i respectivey; P is the tota number of pixes of the test images. 5. Impementation Detais We impemented the proposed deep mode using the popuar Caffe framework [5] on a singe Nvidia Tesa K80 GPU with GB memory. More detais on the front-end CNN architectures, the generation of muti-scae side outputs and the parameter settings are eaborated as foows. 5.. Front-end CNN Architectures To study the infuence of the frond-end CNN, we consider severa network architectures incuding: (i) AexNet [3], (ii) VGG-6 [44], (iii) a fuy convoutiona encoder-decoder network derived from VGG-6, referred as VGG-ED [], (iv) a Convoution-Deconvoution network based on VGG- 6, referred as VGG-CD [34], and (v) ResNet-50 [7]. For AexNet, VGG-6 and ResNet-50, we obtain the side outputs from the ast semantic convoutiona ayer of different convoutiona bocks, in which each the ayer produces feature maps with the same shape. The scheme utiized for the generation wi be introduced in the next section. The number of side outputs considered in our experiments is 5, 5 and 4 for AexNet, VGG-6 and ResNet-50, respectivey. As VGG-ED and VGG-CD have been widey used for dense pixe-eve prediction tasks, we aso investigate them in the experimenta anaysis. Both VGG-ED and VGG-CD have a

9 JOURNAL OF L A T E X CLASS FILES, VOL. 4, NO. 8, AUGUST 05 9 TABLE Quantitative performance comparison of different front-end deep network architectures and the proposed two muti-scae CRF modes associated with the pretrained front-end networks on the NYU Depth V dataset. Network Architecture Error (ower is better) Accuracy (higher is better) re og0 rms δ <.5 δ <.5 δ <.5 3 AexNet (pretrain) VGG-6 (pretrain) VGG-ED (pretrain) VGG-CD (pretrain) ResNet-50 (pretrain) AexNet + cascade-crfs VGG-6 + cascade-crfs VGG-ED + cascade-crfs VGG-CD + cascade-crfs ResNet-50 + cascade-crfs symmetric network structure, and five side outputs are then generated from the different bocks of the decoder or the deconvoutiona network part. 5.. Generation of muti-scae CNN side-outputs Our approach can be appied with any muti-scae frontend CNN modes incuding those with skip-connections. We here briefy describe the scheme we adopt to buid CNN side outputs from the front-end CNN for the mutiscae fusion with CRFs. In [46] a convoutiona ayer is first used to generate a score map from the feature map and then a deconvoutiona (deconv) ayer is adopted as a biatera upsamping operator to enarge the score map such as to obtain the same size of the input image. However, we noticed that by adopting the approach in [46] the generated side outputs associated to the feature maps with smaer size are very coarse, causing a ot scene detais missing. To address this probem, after the convoutiona ayer, we stack severa deconv ayers, each of them enarging the output map by two times. A Rectified Linear Unit (ReLU) is appied after each deconv ayer. After the ast deconv ayer we use a crop ayer to cut the extra margin and obtain a side output with the same resoution of the ground-truth image. We empoy this scheme to obtain side outputs for AexNet, VGG-6 and ResNet-50, whie for VGG-CD and VGG-ED, we use the same setting as in [46], as their decoder or deconvoutiona part is abe to obtain more fine-grained side outputs. Tabe shows detaied network parameters used to obtain the side output from the ast convoutiona bock of ResNet-50 (i.e. from the ayer res5c) Parameters settings As described in Section 4.4, training consists of a pretraining and a fine tuning phase. In the first phase, we train the front-end CNN with parameters initiaized with the corresponding ImageNet pretrained modes. For AexNet, VGG- 6, VGG-ED and VGG-CD, the batch size is set to and for ResNet-50 to 8. The earning rate is initiaized at 0 and decreases by 0 times around every 50 epochs. 80 epochs are performed for pretraining in tota. The momentum and the weight decay are set to 0.9 and , respectivey. When the pretraining is finished, we connect a the side outputs of the front-end CNN to our CRFs-based muti-scae deep modes for end-to-end training of the whoe network. In this phase, the batch size is reduced to 6 and a fixed earning rate of 0 is used. The same parameters of the pre-training phase are used for momentum and weight decay. The bandwidth weights for the Gaussian kernes are obtained through cross vaidation. The number of meanfied iterations is set to 5 for efficient training for both the cascade CRFs and muti-scae CRFs. We do not observe significant improvement using more than 5 iterations. Training the whoe network takes around 5 hours on the Make3D dataset, 8 hours on the KITTI dataset and 3 hours on the NYU v dataset. 5.3 Experimenta Resuts To present the experimenta resuts, we start from an abation study for investigating the performance impact of different front-end network architectures, the effectiveness of the proposed CRF-based muti-scae fusion modes and the infuence of the stacking orders for making the sequentia neura network. Then we compare the overa performance with the state of the art methods, and finay the quaitative resuts and running time are anayzed Evauation of different front-end CNN architectures As discussed above, the proposed muti-scae CRF-based fusion modes are genera and different deep architectures can be used for the front-end network. In this section we evauate the impact of this choice on the depth estimation performance. We consider both the case of the pretrained front-end modes (i.e. ony side osses are empoyed but the muti-scae CRF modes are not pugged), indicated with pretrain, and the case of the fine-tuned modes, incuding the front-end network with the muti-scae cascade CRFs (cascade-crfs). The resuts of the experiments are shown in Tabe. As expected, in both cases deeper CNN architectures produced more accurate predictions, and ResNet- 50 achieves the best performance among a the front-end networks. Moreover, VGG-CD is sighty better than VGG- ED, and both these modes outperforms VGG-6, showing that the symmetric network structure is beneficia for the

10 JOURNAL OF L A T E X CLASS FILES, VOL. 4, NO. 8, AUGUST 05 0 TABLE 3 Quantitative baseine comparison with different muti-scae fusion schemes, and with the continuous CRF as a post-processing modue on the NYU Depth V dataset. The number of scaes is investigated for both muti-scae modes with a bottom up message passing structure. Method Error Accuracy (ower is better) (higher is better) re og0 rms δ <.5 δ <.5 δ <.5 3 HED [46] Hypercoumn [6] C-CRF Ours (singe-scae) Ours - cascade (3-scae) Ours - cascade (5-scae) Ours - unified (3-scae) Ours - unified (5-scae) TABLE 4 Quantitative performance evauation of different message passing structures for the cascade CRF mode via buiding the sequentia deep network with the proposed C-MF bock on the NYU Depth V dataset. Method Error (ower is better) Accuracy (higher is better) re og0 rms δ <.5 δ <.5 δ <.5 3 Top down structure Bottom up structure Skip connection structure A to one structure TABLE 5 Overa performance comparison with state of the art methods on the NYU Depth V dataset. Our approach achieves the best on most of the metrics, whie the runners-up Eigen and Fergus [] and Laina et a. [7] empoy more training data than ours. ResNet-50-unified means using ResNet-50 front-end network with the proposed muti-scae unified CRF mode. Method Error (ower is better) Accuracy (higher is better) re og0 rms rms (sc-inv.) δ <.5 δ <.5 δ <.5 3 Karsch et a. [4] Ladicky et a. [0] Liu et a. [3] Ladicky et a. [5] Zhuo et a. [53] Liu et a. [30] Wang et a. [45] Eigen et a. [] Roi and Todorovic [38] Eigen and Fergus [] Laina et a. [7] Ours (ResNet-50-unified-4.7K-bottom up) Ours (ResNet-50-unified-95K-bottom up) Ours (ResNet-50-unified-95K-a to one) TABLE 6 Overa performance comparison with state of the art methods on the Make3D dataset. Our approach outperforms a the competitors w.r.t. the C Error, and performs ony sighty worse on the re metric of the C Error than Laina et a. [7] using Huber oss and significanty arger training data. Method C Error C Error re og0 rms rms (sc-inv.) re og0 rms Karsch et a. [0] Liu et a. [3] Liu et a. [30] Li et a. [8] Laina et a. [7] ( oss) Laina et a. [7] (Huber oss) Ours (ResNet-50-cascade-bottom up) Ours (ResNet-50-unified-bottom up) Ours (ResNet-50-unified-0K-bottom up) Ours (ResNet-50-unified-0K-a to one)

11 JOURNAL OF LATEX CLASS FILES, VOL. 4, NO. 8, AUGUST 05 GroundTruth Ours Laina et a. [6] RGB Image Fig. 7. Exampes of depth prediction resuts on the Make3D dataset. The four rows from up to bottom are the input test RGB images, the resuts produced from Laina et a. [7], the resuts of our ResNet50-MSCRF mode and the groundtruth depth maps, respectivey. dense pixe-eve prediction probems. Importanty, for a considered front-end networks there is a significant increase in performance when appying the proposed CRF-based modes. Figure 6 depicts some exampes of predicted depth maps using different front-end networks on the NYU Depth V test dataset. As we can see from the figure, the quaitative resuts confirm that the deeper architecture eads to better depth recovery. By comparing the reconstructed depth maps obtained with pretrained modes (e.g. using ony the frontend networks VGG-CD and ResNet-50) with those generated with our muti-scae modes, it is cear that our approach remarkaby improves prediction accuracy and visua quaity Evauation of different muti-scae CRF fusion modes To evauate the effectiveness of the proposed CRF-based muti-scae fusion modes, we conduct experiments on the NYU Depth V dataset and consider the foowing baseines: (i) the HED method in [46], where mutipe side outputs are fused with a weighted averaging scheme and the sum of mutipe side output osses is jointy minimized as deep supervision with a cross-entropy oss, whie we use the square oss as our probem invoves continuous variabes; (ii) the Hypercoumn method [6], where muti-scae feature maps generated from different semantic network ayers are concatenated and fused; (iii) a continuous CRF ( C-CRF ) appied on the prediction of the front-end network, i.e. pugging after the ast output ayer as a post-processing modue without end-toend training. For the first two baseines, we want to compare our modes with other popuar methods for fusing muti-scae CNN information, whie the third one aims at demonstrating the effectiveness of the continous CRF itsef. In these experiments we consider VGG-CD as the front-end CNN architecture. The resuts of the comparison are shown in Tabe 3. It is evident that with our CRF-based fusion modes (both the cascade CRFs and the unified CRFs) more accurate depth maps can be obtained, demonstrating that our idea of integrating compementary information derived from CNN side output maps within a graphica mode framework is more effective than traditiona fusion schemes. Tabe 3 aso compares the proposed cascade and unified modes. As expected, the unified mode produces more accurate depth maps, at the price of an increased computationa cost. This can aso be observed from Tabe. The C-CRF (in Tabe 3) improves the depth estimation at a metrics over the VGGCD (pretrain) (in Tabe ) with a cear gap, showing the CRF mode is very usefu for refining the deepy predicted map. By jointy earning with the front-end (i.e. end-to-end training), ours (singe-scae) further boosts the performance. Finay, we anayze the impact of adopting mutipe scaes and compare our compete modes (5 scaes) with their version when ony a singe and three side output ayers are used. It is evident that the performance can be improved by increasing the number of scaes Evauation of muti-scae message passing structures We evauate the infuence of different muti-scae message passing structures using the cascade CRF mode. Four connection structures as depicted in Fig. 3 are compared. Tabe 4

12 JOURNAL OF L A T E X CLASS FILES, VOL. 4, NO. 8, AUGUST 05 RGB Image GT Depth Map Eigen et a. [] Zhou et a. [5] Garg et a. [3] Godard et a. [5] Ours Fig. 8. Exampes of depth prediction resuts on the KITTI raw dataset. Quaitative comparison with other depth estimation methods on this dataset is presented. The sparse ground-truth depth maps are interpoated for better visuaization. TABLE 7 Overa performance comparison with state of the art methods on the KITTI raw dataset. Our approach obtains very competitive performance over a the competitors w.r.t. a the evauation metrics on the testing set given by Eigen et a. []. For the setting, caps means different gt/predicted depth range and stereo means using eft and right images captured from two monocuar cameras in the training phase. Ours uses a unified mode considering both the bottom up and the a to one network structure. Method Setting Error (ower is better) Accuracy (higher is better) range stereo re sq re rms rms (sc-inv.) δ <.5 δ <.5 δ <.5 3 Saxena et a. [4] 0-80m No Eigen et a. [] 0-80m No Liu et a. [30] 0-80m No Zhou et a. [5] 0-80m No Kuznietsov et a. [4] (ony supervised) 0-80m No Garg et a. [3] 0-80m Yes Garg et a. [3] L + Aug 8x -50m Yes Godard et a. [5] 0-80m Yes Kuznietsov et a. [4] 0-80m Yes Ours (ResNet-50 Pretrain) 0-80m No Ours (ResNet-50 Fine-tune-bottom up) 0-80m No Ours (ResNet-50 Fine-tune-a to one) 0-80m No shows the monocuar depth estimation resuts on NYUD-v dataset. The comparison resuts confirm that the message passing structure indeed has an impact on the fina performance. The bottom up and top down structures have simiar performance, whie the skip-connection structure sighty outperform these two. The a to one structure performs the best, producing around.0% gain in terms of the re metric than the top down structure, which means that directy passing message to the finest prediction scae from the rest scaes can absorb more compementary information than the gradua passing fashions used in the first three structures Comparison with state of the art We aso compare our approach with state of the art methods on a the datasets. For previous works we directy report resuts taken from the origina papers. Tabe 5 shows the resuts of the comparison on the NYU Depth V dataset. For our approach we consider the cascade mode and use two different training sets for pretraining: the sma set of 4.7K pairs empoyed in a our experiments and a arger set of 95K images as in [7]. Note that for fine tuning we ony use the sma set. As shown in the tabe, our approach outperforms a competing methods and it is the second best mode when we use ony 4.7K images. This is remarkabe considering that, for instance, in [] 0K image pairs are used for training. Our mode achieves the best resuts on a the metrics via using 95K pretraining sampes and using the proposed a to one message passing structure. We aso perform a comparison with severa state of the art methods on the Make3D dataset (Tabe 6). Foowing [3], the error metrics are computed in two different settings, i.e. considering (C) ony the regions with groundtruth depth ess than 70 and (C) the entire image. It is cear that the proposed approach is significanty better than previous methods. In particuar, comparing with Laina et a. [7], the best performing method in the iterature, it is evident that our approach, both in case of the cascade and the mutiscae modes, outperforms [7] by a significant margin when Laina et a. aso adopt a square oss. It is worth noting that in [7] a training set of 5K image pairs is considered, whie we empoy much ess training sampes. By increasing our training data (i.e. 0K in the pretraining phase), our muti-scae CRF mode aso outperforms [7] with Huber oss (og0 and rms metrics). The fina performance is further boosted by considering the a to one structure simiar to NYUD v dataset. Finay, it is very interesting to compare the proposed method with the approach in Liu et a. [30], since

13 JOURNAL OF LATEX CLASS FILES, VOL. 4, NO. 8, AUGUST 05 3 aso shows a quaitative comparison between the pretrained front-end CNN and the fine-tuned whoe mode. It can be observed that our approach can recover more scene structures and detais. We beieve that this is probaby because the effective structured fusion of the coarse-to-fine mutiscae predictions of the deep network with the proposed CRF modes. For the infuence of the variance in the CRF mode on the prediction errors, as the variance term is actuay acted as a normaization factor after the message passing. It may have infuence but the main infuence is dominated by the predictions of deep front-end CNN based on our observation from the experimenta resuts. Fig. 9. Exampes of depth prediction resuts on the KITTI raw dataset. The midde coumn and the right coumn show the pretrained and the fine-tuned estimation resuts respectivey. in [30] a CRF mode is aso empoyed within a deep network trained end-to-end. Our method significanty outperforms [30] in terms of accuracy. Moreover, in [30] a time of.sec is reported for performing inference on a test image but the time required by superpixes cacuations is not taken into account. Oppositey, with our method computing the depth map for a singe image takes about sec in tota. The state of the art comparison on KITTI dataset is shown in Tabe 7. The competitors incude Saxena et a. [39], Eigen et a. [], Liu et a. [3], Zhou et a. [5], Garg et a. [3], Godard et a. [5] and Kuznietsov et a. [4]. As the same setting of ours, the first four methods use singe monocuar images in the training phase, whie the ast two considered two monocuar images with a stereo setting for training. Among the first four competitors, Eigen et a. [] significanty outperforms the others in terms of the metric of the mean reative error (re), due to the usage of argescae training data (more than miion sampes). Whie our mode achieves much better performance than Eigen et a. [] in a metrics with much ess data (.6K sampes). Athough the training of the ast two methods (requiring two monocuar images) is not equa to our setting, the proposed approach with both the bottom-up and the a to one structures sti produces better resuts than them with cear performance gap in a metrics. Kuznietsov et a. [4] reports resuts for both the stereo training and the monocuar supervised training. It is not directy comparabe with the stereo training setting, which is significanty different as it requires both eft and right images from a binocuar camera. Ours focuses on monocuar depth estimation and achieves ower error performance comparing with theirs using the same monocuar setting. Fig. 8 aso shows some quaitative comparison resuts with these methods, further demonstrating the advantageous performance of our approach Quaitative depth estimation resuts Fig. 6, 7 and 9 show some exampes of the quaitative depth estimation resuts and the comparison with the competing methods on the NYUD-V, Make3D and KITTI dataset respectivey. It is cear that the proposed approach is abe to produce sharper depth estimation with better visua quaity compared with the cassic CNN structures, which demonstrates the importance of the prediction aided by the CRFs with appearance and smoothness constraints. Fig Empirica run-time anaysis Computationa run-time compexity is an important aspect for deep structured prediction modes. In this paragraph we provide a short discussion about the computationa cost of the proposed CRFs-based modes. As shown in the paper, the muti-scae CRF mode achieves better accuracy and ower error than the cascade mode for both the NYU Depth V and the Make3D experiments. However, as expected, the cascade mode is more advantageous in terms of the running time. For instance, considering ResNet-50 as the front-end CNN, the time required at test phase for one image is.0 seconds w.r.t. the cascade mode and.45 seconds w.r.t. the muti-scae mode, and the image resoution is pixes. Higher resoution of the network input usuay brings more computationa overhead. We aso test the running time given the input resoution of and it costs around.5 seconds for processing one image. We beieve that if we reduce the receptive fied of the CRF mode from fuy connected to partiay connected, the computing time coud be significanty reduced. 6 C ONCLUSION In this paper, we introduced a nove approach for predicting depth maps from a singe RGB image. The core of the method is a nove framework based on continuous CRFs for fusing muti-scae score-eve side-outputs derived from different semantic CNN ayers. We demonstrated that this framework can be used in combination with severa common CNN architectures and can be impemented for end-to-end training. The extensive experiments confirmed the vaidity of the proposed muti-scae fusion approach. Whie this paper specificay addresses the probem of depth prediction, we beieve that other tasks in computer vision invoving pixe-eve predictions of continuous variabes, can aso benefit from our impementation of the mean-fied updating within the CNN framework. Currenty, the muti-scae fusion is performed on the score eve. Further research direction wi investigate the integration of both the feature- and the score-eve muti-scae information within a unified graphica mode. Moreover, the study of strategies for further improving the training and testing efficiency of the CNN-CRF modes wi aso be an interesting aspect in the future work. The monocuar depth estimation is particuary usefu for various crossmoda recognition and detection tasks. A straightforward foow-up of this work woud be designing a joint mutitask deep mode to transfer the earned depth mode for

Multi-Scale Continuous CRFs as Sequential Deep Networks for Monocular Depth Estimation

Multi-Scale Continuous CRFs as Sequential Deep Networks for Monocular Depth Estimation Muti-Scae Continuous CRFs as Sequentia Deep Networks for Monocuar Depth Estimation Dan Xu 1, Eisa Ricci 4,5, Wani Ouyang 2,3, Xiaogang Wang 2, Nicu Sebe 1 1 University of Trento, 2 The Chinese University

More information

Mobile App Recommendation: Maximize the Total App Downloads

Mobile App Recommendation: Maximize the Total App Downloads Mobie App Recommendation: Maximize the Tota App Downoads Zhuohua Chen Schoo of Economics and Management Tsinghua University chenzhh3.12@sem.tsinghua.edu.cn Yinghui (Catherine) Yang Graduate Schoo of Management

More information

On-Chip CNN Accelerator for Image Super-Resolution

On-Chip CNN Accelerator for Image Super-Resolution On-Chip CNN Acceerator for Image Super-Resoution Jung-Woo Chang and Suk-Ju Kang Dept. of Eectronic Engineering, Sogang University, Seou, South Korea {zwzang91, sjkang}@sogang.ac.kr ABSTRACT To impement

More information

JOINT IMAGE REGISTRATION AND EXAMPLE-BASED SUPER-RESOLUTION ALGORITHM

JOINT IMAGE REGISTRATION AND EXAMPLE-BASED SUPER-RESOLUTION ALGORITHM JOINT IMAGE REGISTRATION AND AMPLE-BASED SUPER-RESOLUTION ALGORITHM Hyo-Song Kim, Jeyong Shin, and Rae-Hong Park Department of Eectronic Engineering, Schoo of Engineering, Sogang University 35 Baekbeom-ro,

More information

Research on UAV Fixed Area Inspection based on Image Reconstruction

Research on UAV Fixed Area Inspection based on Image Reconstruction Research on UAV Fixed Area Inspection based on Image Reconstruction Kun Cao a, Fei Wu b Schoo of Eectronic and Eectrica Engineering, Shanghai University of Engineering Science, Abstract Shanghai 20600,

More information

Sensitivity Analysis of Hopfield Neural Network in Classifying Natural RGB Color Space

Sensitivity Analysis of Hopfield Neural Network in Classifying Natural RGB Color Space Sensitivity Anaysis of Hopfied Neura Network in Cassifying Natura RGB Coor Space Department of Computer Science University of Sharjah UAE rsammouda@sharjah.ac.ae Abstract: - This paper presents a study

More information

COMPRESSIVE sensing (CS), which aims at recovering

COMPRESSIVE sensing (CS), which aims at recovering D-Net: Deep Learning pproach for Compressive Sensing RI Yan Yang, Jian Sun, Huibin Li, and Zongben u ariv:705.06869v [cs.cv] 9 ay 07 bstract Compressive sensing (CS) is an effective approach for fast agnetic

More information

Nearest Neighbor Learning

Nearest Neighbor Learning Nearest Neighbor Learning Cassify based on oca simiarity Ranges from simpe nearest neighbor to case-based and anaogica reasoning Use oca information near the current query instance to decide the cassification

More information

Hiding secrete data in compressed images using histogram analysis

Hiding secrete data in compressed images using histogram analysis University of Woongong Research Onine University of Woongong in Dubai - Papers University of Woongong in Dubai 2 iding secrete data in compressed images using histogram anaysis Farhad Keissarian University

More information

Research of Classification based on Deep Neural Network

Research of  Classification based on Deep Neural Network 2018 Internationa Conference on Sensor Network and Computer Engineering (ICSNCE 2018) Research of Emai Cassification based on Deep Neura Network Wang Yawen Schoo of Computer Science and Engineering Xi

More information

arxiv: v2 [cs.cv] 15 Mar 2017

arxiv: v2 [cs.cv] 15 Mar 2017 Viraiency: Pooing Loca Viraity 1 Xavier Aameda-Pineda1,2, Andrea Pizer1, Dan Xu1, Nicu Sebe1, Eisa Ricci3,4 University of Trento, 2 Perception Team, INRIA Grenobe, 3 University of Perugia, 4 Fondazione

More information

A Fast Block Matching Algorithm Based on the Winner-Update Strategy

A Fast Block Matching Algorithm Based on the Winner-Update Strategy In Proceedings of the Fourth Asian Conference on Computer Vision, Taipei, Taiwan, Jan. 000, Voume, pages 977 98 A Fast Bock Matching Agorithm Based on the Winner-Update Strategy Yong-Sheng Chenyz Yi-Ping

More information

Language Identification for Texts Written in Transliteration

Language Identification for Texts Written in Transliteration Language Identification for Texts Written in Transiteration Andrey Chepovskiy, Sergey Gusev, Margarita Kurbatova Higher Schoo of Economics, Data Anaysis and Artificia Inteigence Department, Pokrovskiy

More information

Utility-based Camera Assignment in a Video Network: A Game Theoretic Framework

Utility-based Camera Assignment in a Video Network: A Game Theoretic Framework This artice has been accepted for pubication in a future issue of this journa, but has not been fuy edited. Content may change prior to fina pubication. Y.LI AND B.BHANU CAMERA ASSIGNMENT: A GAME-THEORETIC

More information

Layer-Specific Adaptive Learning Rates for Deep Networks

Layer-Specific Adaptive Learning Rates for Deep Networks Layer-Specific Adaptive Learning Rates for Deep Networks arxiv:1510.04609v1 [cs.cv] 15 Oct 2015 Bharat Singh, Soham De, Yangmuzi Zhang, Thomas Godstein, and Gavin Tayor Department of Computer Science Department

More information

Response Surface Model Updating for Nonlinear Structures

Response Surface Model Updating for Nonlinear Structures Response Surface Mode Updating for Noninear Structures Gonaz Shahidi a, Shamim Pakzad b a PhD Student, Department of Civi and Environmenta Engineering, Lehigh University, ATLSS Engineering Research Center,

More information

A Petrel Plugin for Surface Modeling

A Petrel Plugin for Surface Modeling A Petre Pugin for Surface Modeing R. M. Hassanpour, S. H. Derakhshan and C. V. Deutsch Structure and thickness uncertainty are important components of any uncertainty study. The exact ocations of the geoogica

More information

Automatic Hidden Web Database Classification

Automatic Hidden Web Database Classification Automatic idden Web atabase Cassification Zhiguo Gong, Jingbai Zhang, and Qian Liu Facuty of Science and Technoogy niversity of Macau Macao, PRC {fstzgg,ma46597,ma46620}@umac.mo Abstract. In this paper,

More information

Multiple Plane Phase Retrieval Based On Inverse Regularized Imaging and Discrete Diffraction Transform

Multiple Plane Phase Retrieval Based On Inverse Regularized Imaging and Discrete Diffraction Transform Mutipe Pane Phase Retrieva Based On Inverse Reguaried Imaging and Discrete Diffraction Transform Artem Migukin, Vadimir Katkovnik, and Jaakko Astoa Department of Signa Processing, Tampere University of

More information

GPU Implementation of Parallel SVM as Applied to Intrusion Detection System

GPU Implementation of Parallel SVM as Applied to Intrusion Detection System GPU Impementation of Parae SVM as Appied to Intrusion Detection System Sudarshan Hiray Research Schoar, Department of Computer Engineering, Vishwakarma Institute of Technoogy, Pune, India sdhiray7@gmai.com

More information

Digital Image Watermarking Algorithm Based on Fast Curvelet Transform

Digital Image Watermarking Algorithm Based on Fast Curvelet Transform J. Software Engineering & Appications, 010, 3, 939-943 doi:10.436/jsea.010.310111 Pubished Onine October 010 (http://www.scirp.org/journa/jsea) 939 igita Image Watermarking Agorithm Based on Fast Curveet

More information

Distance Weighted Discrimination and Second Order Cone Programming

Distance Weighted Discrimination and Second Order Cone Programming Distance Weighted Discrimination and Second Order Cone Programming Hanwen Huang, Xiaosun Lu, Yufeng Liu, J. S. Marron, Perry Haaand Apri 3, 2012 1 Introduction This vignette demonstrates the utiity and

More information

Endoscopic Motion Compensation of High Speed Videoendoscopy

Endoscopic Motion Compensation of High Speed Videoendoscopy Endoscopic Motion Compensation of High Speed Videoendoscopy Bharath avuri Department of Computer Science and Engineering, University of South Caroina, Coumbia, SC - 901. ravuri@cse.sc.edu Abstract. High

More information

Quality Assessment using Tone Mapping Algorithm

Quality Assessment using Tone Mapping Algorithm Quaity Assessment using Tone Mapping Agorithm Nandiki.pushpa atha, Kuriti.Rajendra Prasad Research Schoar, Assistant Professor, Vignan s institute of engineering for women, Visakhapatnam, Andhra Pradesh,

More information

A New Supervised Clustering Algorithm Based on Min-Max Modular Network with Gaussian-Zero-Crossing Functions

A New Supervised Clustering Algorithm Based on Min-Max Modular Network with Gaussian-Zero-Crossing Functions 2006 Internationa Joint Conference on Neura Networks Sheraton Vancouver Wa Centre Hote, Vancouver, BC, Canada Juy 16-21, 2006 A New Supervised Custering Agorithm Based on Min-Max Moduar Network with Gaussian-Zero-Crossing

More information

Neural Network Enhancement of the Los Alamos Force Deployment Estimator

Neural Network Enhancement of the Los Alamos Force Deployment Estimator Missouri University of Science and Technoogy Schoars' Mine Eectrica and Computer Engineering Facuty Research & Creative Works Eectrica and Computer Engineering 1-1-1994 Neura Network Enhancement of the

More information

Learning Dynamic Guidance for Depth Image Enhancement

Learning Dynamic Guidance for Depth Image Enhancement Learning Dynamic Guidance for Depth Image Enhancement Shuhang Gu 1, Wangmeng Zuo 2, Shi Guo 2, Yunjin Chen 3, Chongyu Chen 4,1, Lei Zhang 1, 1 The Hong Kong Poytechnic University, 2 Harbin Institute of

More information

Automatic Grouping for Social Networks CS229 Project Report

Automatic Grouping for Social Networks CS229 Project Report Automatic Grouping for Socia Networks CS229 Project Report Xiaoying Tian Ya Le Yangru Fang Abstract Socia networking sites aow users to manuay categorize their friends, but it is aborious to construct

More information

University of Illinois at Urbana-Champaign, Urbana, IL 61801, /11/$ IEEE 162

University of Illinois at Urbana-Champaign, Urbana, IL 61801, /11/$ IEEE 162 oward Efficient Spatia Variation Decomposition via Sparse Regression Wangyang Zhang, Karthik Baakrishnan, Xin Li, Duane Boning and Rob Rutenbar 3 Carnegie Meon University, Pittsburgh, PA 53, wangyan@ece.cmu.edu,

More information

file://j:\macmillancomputerpublishing\chapters\in073.html 3/22/01

file://j:\macmillancomputerpublishing\chapters\in073.html 3/22/01 Page 1 of 15 Chapter 9 Chapter 9: Deveoping the Logica Data Mode The information requirements and business rues provide the information to produce the entities, attributes, and reationships in ogica mode.

More information

CNN and RNN Based Neural Networks for Action Recognition

CNN and RNN Based Neural Networks for Action Recognition Journa of Physics: Conference Series PAPER OPEN ACCESS CNN and RNN Based Neura Networks for Action Recognition To cite this artice: Chen Zhao et a 2018 J. Phys.: Conf. Ser. 1087 062013 View the artice

More information

Dialectical GAN for SAR Image Translation: From Sentinel-1 to TerraSAR-X

Dialectical GAN for SAR Image Translation: From Sentinel-1 to TerraSAR-X Artice Diaectica GAN for SAR Image Transation: From Sentine-1 to TerraSAR-X Dongyang Ao 1,2, Corneiu Octavian Dumitru 1,Gottfried Schwarz 1 and Mihai Datcu 1, * 1 German Aerospace Center (DLR), Münchener

More information

QoS-Aware Data Transmission and Wireless Energy Transfer: Performance Modeling and Optimization

QoS-Aware Data Transmission and Wireless Energy Transfer: Performance Modeling and Optimization QoS-Aware Data Transmission and Wireess Energy Transfer: Performance Modeing and Optimization Dusit Niyato, Ping Wang, Yeow Wai Leong, and Tan Hwee Pink Schoo of Computer Engineering, Nanyang Technoogica

More information

MACHINE learning techniques can, automatically,

MACHINE learning techniques can, automatically, Proceedings of Internationa Joint Conference on Neura Networks, Daas, Texas, USA, August 4-9, 203 High Leve Data Cassification Based on Network Entropy Fiipe Aves Neto and Liang Zhao Abstract Traditiona

More information

The Classification of Stored Grain Pests based on Convolutional Neural Network

The Classification of Stored Grain Pests based on Convolutional Neural Network 2017 2nd Internationa Conference on Mechatronics and Information Technoogy (ICMIT 2017) The Cassification of Stored Grain Pests based on Convoutiona Neura Network Dexian Zhang1, Wenun Zhao*, 1 1 Schoo

More information

NestedNet: Learning Nested Sparse Structures in Deep Neural Networks

NestedNet: Learning Nested Sparse Structures in Deep Neural Networks NestedNet: Learning Nested Sparse Structures in Deep Neura Networks Eunwoo Kim Chanho Ahn Songhwai Oh Department of ECE and ASRI, Seou Nationa University, South Korea {kewoo15, mychahn, songhwai}@snu.ac.kr

More information

Load Balancing by MPLS in Differentiated Services Networks

Load Balancing by MPLS in Differentiated Services Networks Load Baancing by MPLS in Differentiated Services Networks Riikka Susitaiva, Jorma Virtamo, and Samui Aato Networking Laboratory, Hesinki University of Technoogy P.O.Box 3000, FIN-02015 HUT, Finand {riikka.susitaiva,

More information

Special Edition Using Microsoft Excel Selecting and Naming Cells and Ranges

Special Edition Using Microsoft Excel Selecting and Naming Cells and Ranges Specia Edition Using Microsoft Exce 2000 - Lesson 3 - Seecting and Naming Ces and.. Page 1 of 8 [Figures are not incuded in this sampe chapter] Specia Edition Using Microsoft Exce 2000-3 - Seecting and

More information

Providing Hop-by-Hop Authentication and Source Privacy in Wireless Sensor Networks

Providing Hop-by-Hop Authentication and Source Privacy in Wireless Sensor Networks The 31st Annua IEEE Internationa Conference on Computer Communications: Mini-Conference Providing Hop-by-Hop Authentication and Source Privacy in Wireess Sensor Networks Yun Li Jian Li Jian Ren Department

More information

On Trivial Solution and High Correlation Problems in Deep Supervised Hashing

On Trivial Solution and High Correlation Problems in Deep Supervised Hashing On Trivia Soution and High Correation Probems in Deep Supervised Hashing Yuchen Guo, Xin Zhao, Guiguang Ding, Jungong Han Schoo of Software, Tsinghua University, Beijing 84, China Schoo of Computing and

More information

AUTOMATIC gender classification based on facial images

AUTOMATIC gender classification based on facial images SUBMITTED TO IEEE TRANSACTIONS ON NEURAL NETWORKS 1 Gender Cassification Using a Min-Max Moduar Support Vector Machine with Incorporating Prior Knowedge Hui-Cheng Lian and Bao-Liang Lu, Senior Member,

More information

DETERMINING INTUITIONISTIC FUZZY DEGREE OF OVERLAPPING OF COMPUTATION AND COMMUNICATION IN PARALLEL APPLICATIONS USING GENERALIZED NETS

DETERMINING INTUITIONISTIC FUZZY DEGREE OF OVERLAPPING OF COMPUTATION AND COMMUNICATION IN PARALLEL APPLICATIONS USING GENERALIZED NETS DETERMINING INTUITIONISTIC FUZZY DEGREE OF OVERLAPPING OF COMPUTATION AND COMMUNICATION IN PARALLEL APPLICATIONS USING GENERALIZED NETS Pave Tchesmedjiev, Peter Vassiev Centre for Biomedica Engineering,

More information

Outline. Parallel Numerical Algorithms. Forward Substitution. Triangular Matrices. Solving Triangular Systems. Back Substitution. Parallel Algorithm

Outline. Parallel Numerical Algorithms. Forward Substitution. Triangular Matrices. Solving Triangular Systems. Back Substitution. Parallel Algorithm Outine Parae Numerica Agorithms Chapter 8 Prof. Michae T. Heath Department of Computer Science University of Iinois at Urbana-Champaign CS 554 / CSE 512 1 2 3 4 Trianguar Matrices Michae T. Heath Parae

More information

MULTIGRID REDUCTION IN TIME FOR NONLINEAR PARABOLIC PROBLEMS: A CASE STUDY

MULTIGRID REDUCTION IN TIME FOR NONLINEAR PARABOLIC PROBLEMS: A CASE STUDY MULTIGRID REDUCTION IN TIME FOR NONLINEAR PARABOLIC PROBLEMS: A CASE STUDY R.D. FALGOUT, T.A. MANTEUFFEL, B. O NEILL, AND J.B. SCHRODER Abstract. The need for paraeism in the time dimension is being driven

More information

AUTOMATIC IMAGE RETARGETING USING SALIENCY BASED MESH PARAMETERIZATION

AUTOMATIC IMAGE RETARGETING USING SALIENCY BASED MESH PARAMETERIZATION S.Sai Kumar et a. / (IJCSIT Internationa Journa of Computer Science and Information Technoogies, Vo. 1 (4, 010, 73-79 AUTOMATIC IMAGE RETARGETING USING SALIENCY BASED MESH PARAMETERIZATION 1 S.Sai Kumar,

More information

Multi-task hidden Markov modeling of spectrogram feature from radar high-resolution range profiles

Multi-task hidden Markov modeling of spectrogram feature from radar high-resolution range profiles http://asp.eurasipjournas.com/content/22//86 RESEARCH Open Access Muti-task hidden Markov modeing of spectrogram feature from radar high-resoution range profies Mian Pan, Lan Du *, Penghui Wang, Hongwei

More information

A Two-Step Approach to Hallucinating Faces: Global Parametric Model and Local Nonparametric Model

A Two-Step Approach to Hallucinating Faces: Global Parametric Model and Local Nonparametric Model A Two-Step Approach to aucinating Faces: Goba Parametric Mode and Loca Nonparametric Mode Ce Liu eung-yeung Shum Chang-Shui Zhang State Key Lab of nteigent Technoogy and Systems, Dept. of Automation, Tsinghua

More information

arxiv: v1 [cs.cv] 29 Jul 2018

arxiv: v1 [cs.cv] 29 Jul 2018 Joint Representation and Truncated Inference Learning for Correation Fiter based Tracking Yingjie Yao 1[0000 000 3533 1569], Xiaohe Wu 1[0000 0001 6884 911], Lei Zhang [0000 000 444 494], Shiguang Shan

More information

Performance Enhancement of 2D Face Recognition via Mosaicing

Performance Enhancement of 2D Face Recognition via Mosaicing Performance Enhancement of D Face Recognition via Mosaicing Richa Singh, Mayank Vatsa, Arun Ross, Afze Noore West Virginia University, Morgantown, WV 6506 {richas, mayankv, ross, noore}@csee.wvu.edu Abstract

More information

On Upper Bounds for Assortment Optimization under the Mixture of Multinomial Logit Models

On Upper Bounds for Assortment Optimization under the Mixture of Multinomial Logit Models On Upper Bounds for Assortment Optimization under the Mixture of Mutinomia Logit Modes Sumit Kunnumka September 30, 2014 Abstract The assortment optimization probem under the mixture of mutinomia ogit

More information

DETECTION OF OBSTACLE AND FREESPACE IN AN AUTONOMOUS WHEELCHAIR USING A STEREOSCOPIC CAMERA SYSTEM

DETECTION OF OBSTACLE AND FREESPACE IN AN AUTONOMOUS WHEELCHAIR USING A STEREOSCOPIC CAMERA SYSTEM DETECTION OF OBSTACLE AND FREESPACE IN AN AUTONOMOUS WHEELCHAIR USING A STEREOSCOPIC CAMERA SYSTEM Le Minh 1, Thanh Hai Nguyen 2, Tran Nghia Khanh 2, Vo Văn Toi 2, Ngo Van Thuyen 1 1 University of Technica

More information

MULTITASK MULTIVARIATE COMMON SPARSE REPRESENTATIONS FOR ROBUST MULTIMODAL BIOMETRICS RECOGNITION. Heng Zhang, Vishal M. Patel and Rama Chellappa

MULTITASK MULTIVARIATE COMMON SPARSE REPRESENTATIONS FOR ROBUST MULTIMODAL BIOMETRICS RECOGNITION. Heng Zhang, Vishal M. Patel and Rama Chellappa MULTITASK MULTIVARIATE COMMON SPARSE REPRESENTATIONS FOR ROBUST MULTIMODAL BIOMETRICS RECOGNITION Heng Zhang, Visha M. Pate and Rama Cheappa Center for Automation Research University of Maryand, Coage

More information

Complex Human Activity Searching in a Video Employing Negative Space Analysis

Complex Human Activity Searching in a Video Employing Negative Space Analysis Compex Human Activity Searching in a Video Empoying Negative Space Anaysis Shah Atiqur Rahman, Siu-Yeung Cho, M.K.H. Leung 3, Schoo of Computer Engineering, Nanyang Technoogica University, Singapore 639798

More information

Efficient method to design RF pulses for parallel excitation MRI using gridding and conjugate gradient

Efficient method to design RF pulses for parallel excitation MRI using gridding and conjugate gradient Origina rtice Efficient method to design RF puses for parae excitation MRI using gridding and conjugate gradient Shuo Feng, Jim Ji Department of Eectrica & Computer Engineering, Texas & M University, Texas,

More information

A Comparison of a Second-Order versus a Fourth- Order Laplacian Operator in the Multigrid Algorithm

A Comparison of a Second-Order versus a Fourth- Order Laplacian Operator in the Multigrid Algorithm A Comparison of a Second-Order versus a Fourth- Order Lapacian Operator in the Mutigrid Agorithm Kaushik Datta (kdatta@cs.berkeey.edu Math Project May 9, 003 Abstract In this paper, the mutigrid agorithm

More information

Neural Networks. Aarti Singh. Machine Learning Nov 3, Slides Courtesy: Tom Mitchell

Neural Networks. Aarti Singh. Machine Learning Nov 3, Slides Courtesy: Tom Mitchell Neura Networks Aarti Singh Machine Learning 10-601 Nov 3, 2011 Sides Courtesy: Tom Mitche 1 Logis0c Regression Assumes the foowing func1ona form for P(Y X): Logis1c func1on appied to a inear func1on of

More information

arxiv: v1 [cs.cv] 28 Sep 2015

arxiv: v1 [cs.cv] 28 Sep 2015 Fast Non-oca Stereo Matching based on Hierarchica Disparity Prediction Xuan Luo, Xuejiao Bai, Shuo Li, Hongtao Lu Shanghai Jiao Tong University, No. 800, Dongchuan Road, Shanghai, China {roxanneuo, yukiaya,

More information

Deep Quantization Network for Efficient Image Retrieval

Deep Quantization Network for Efficient Image Retrieval Proceedings of the Thirtieth AAAI Conference on Artificia Inteigence (AAAI-16) Deep Quantiation Network for Efficient Image Retrieva Yue Cao, Mingsheng Long, Jianmin Wang, Han Zhu and Qingfu Wen Schoo

More information

Real-Time Feature Descriptor Matching via a Multi-Resolution Exhaustive Search Method

Real-Time Feature Descriptor Matching via a Multi-Resolution Exhaustive Search Method 297 Rea-Time Feature escriptor Matching via a Muti-Resoution Ehaustive Search Method Chi-Yi Tsai, An-Hung Tsao, and Chuan-Wei Wang epartment of Eectrica Engineering, Tamang University, New Taipei City,

More information

A Memory Grouping Method for Sharing Memory BIST Logic

A Memory Grouping Method for Sharing Memory BIST Logic A Memory Grouping Method for Sharing Memory BIST Logic Masahide Miyazai, Tomoazu Yoneda, and Hideo Fuiwara Graduate Schoo of Information Science, Nara Institute of Science and Technoogy (NAIST), 8916-5

More information

Learning to Learn Second-Order Back-Propagation for CNNs Using LSTMs

Learning to Learn Second-Order Back-Propagation for CNNs Using LSTMs Learning to Learn Second-Order Bac-Propagation for CNNs Using LSTMs Anirban Roy SRI Internationa Meno Par, USA anirban.roy@sri.com Sinisa Todorovic Oregon State University Corvais, USA sinisa@eecs.oregonstate.edu

More information

Image Segmentation Using Semi-Supervised k-means

Image Segmentation Using Semi-Supervised k-means I J C T A, 9(34) 2016, pp. 595-601 Internationa Science Press Image Segmentation Using Semi-Supervised k-means Reza Monsefi * and Saeed Zahedi * ABSTRACT Extracting the region of interest is a very chaenging

More information

A Fast-Convergence Decoding Method and Memory-Efficient VLSI Decoder Architecture for Irregular LDPC Codes in the IEEE 802.

A Fast-Convergence Decoding Method and Memory-Efficient VLSI Decoder Architecture for Irregular LDPC Codes in the IEEE 802. A Fast-Convergence Decoding Method and Memory-Efficient VLSI Decoder Architecture for Irreguar LDPC Codes in the IEEE 82.16e Standards Yeong-Luh Ueng and Chung-Chao Cheng Dept. of Eectrica Engineering,

More information

Design of IP Networks with End-to. to- End Performance Guarantees

Design of IP Networks with End-to. to- End Performance Guarantees Design of IP Networks with End-to to- End Performance Guarantees Irena Atov and Richard J. Harris* ( Swinburne University of Technoogy & *Massey University) Presentation Outine Introduction Mutiservice

More information

Joint disparity and motion eld estimation in. stereoscopic image sequences. Ioannis Patras, Nikos Alvertos and Georgios Tziritas y.

Joint disparity and motion eld estimation in. stereoscopic image sequences. Ioannis Patras, Nikos Alvertos and Georgios Tziritas y. FORTH-ICS / TR-157 December 1995 Joint disparity and motion ed estimation in stereoscopic image sequences Ioannis Patras, Nikos Avertos and Georgios Tziritas y Abstract This work aims at determining four

More information

Action Recognition by Learning Deep Multi-Granular Spatio-Temporal Video Representation

Action Recognition by Learning Deep Multi-Granular Spatio-Temporal Video Representation Action Recognition by Learning Deep Muti-Granuar Spatio-Tempora Video Representation Qing Li 1, Zhaofan Qiu 1, Ting Yao 2, Tao Mei 2, Yong Rui 2, Jiebo Luo 3 1 University of Science and Technoogy of China,

More information

IEEE TRANSACTIONS ON CYBERNETICS 1. Shangfei Wang, Senior Member, IEEE, BowenPan, Huaping Chen, and Qiang Ji, Fellow, IEEE

IEEE TRANSACTIONS ON CYBERNETICS 1. Shangfei Wang, Senior Member, IEEE, BowenPan, Huaping Chen, and Qiang Ji, Fellow, IEEE This artice has been accepted for incusion in a future issue of this journa. Content is fina as presented, with the exception of pagination. IEEE TRANSACTIONS ON CYBERNETICS 1 Therma Augmented Expression

More information

A study of comparative evaluation of methods for image processing using color features

A study of comparative evaluation of methods for image processing using color features A study of comparative evauation of methods for image processing using coor features FLORENTINA MAGDA ENESCU,CAZACU DUMITRU Department Eectronics, Computers and Eectrica Engineering University Pitești

More information

Application of Intelligence Based Genetic Algorithm for Job Sequencing Problem on Parallel Mixed-Model Assembly Line

Application of Intelligence Based Genetic Algorithm for Job Sequencing Problem on Parallel Mixed-Model Assembly Line American J. of Engineering and Appied Sciences 3 (): 5-24, 200 ISSN 94-7020 200 Science Pubications Appication of Inteigence Based Genetic Agorithm for Job Sequencing Probem on Parae Mixed-Mode Assemby

More information

Binarized support vector machines

Binarized support vector machines Universidad Caros III de Madrid Repositorio instituciona e-archivo Departamento de Estadística http://e-archivo.uc3m.es DES - Working Papers. Statistics and Econometrics. WS 2007-11 Binarized support vector

More information

Minimizing Resource Cost for Camera Stream Scheduling in Video Data Center

Minimizing Resource Cost for Camera Stream Scheduling in Video Data Center Gao YH, Ma HD, Liu W. Minimizing resource cost for camera stream scheduing in video data center. JOURNAL OF COMPUTER SCIENCE AND TECHNOLOGY 32(3): 555 570 May 2017. DOI 10.1007/s11390-017-1743-x Minimizing

More information

A METHOD FOR GRIDLESS ROUTING OF PRINTED CIRCUIT BOARDS. A. C. Finch, K. J. Mackenzie, G. J. Balsdon, G. Symonds

A METHOD FOR GRIDLESS ROUTING OF PRINTED CIRCUIT BOARDS. A. C. Finch, K. J. Mackenzie, G. J. Balsdon, G. Symonds A METHOD FOR GRIDLESS ROUTING OF PRINTED CIRCUIT BOARDS A C Finch K J Mackenzie G J Basdon G Symonds Raca-Redac Ltd Newtown Tewkesbury Gos Engand ABSTRACT The introduction of fine-ine technoogies to printed

More information

PHASE retrieval has been an active research topic for decades [1], [2]. The underlying goal is to estimate an unknown

PHASE retrieval has been an active research topic for decades [1], [2]. The underlying goal is to estimate an unknown DOLPHIn Dictionary Learning for Phase Retrieva Andreas M. Timann, Yonina C. Edar, Feow, IEEE, and Juien Maira, Member, IEEE arxiv:60.063v [math.oc] 3 Aug 06 Abstract We propose a new agorithm to earn a

More information

Community-Aware Opportunistic Routing in Mobile Social Networks

Community-Aware Opportunistic Routing in Mobile Social Networks IEEE TRANSACTIONS ON COMPUTERS VOL:PP NO:99 YEAR 213 Community-Aware Opportunistic Routing in Mobie Socia Networks Mingjun Xiao, Member, IEEE Jie Wu, Feow, IEEE, and Liusheng Huang, Member, IEEE Abstract

More information

A Design Method for Optimal Truss Structures with Certain Redundancy Based on Combinatorial Rigidity Theory

A Design Method for Optimal Truss Structures with Certain Redundancy Based on Combinatorial Rigidity Theory 0 th Word Congress on Structura and Mutidiscipinary Optimization May 9 -, 03, Orando, Forida, USA A Design Method for Optima Truss Structures with Certain Redundancy Based on Combinatoria Rigidity Theory

More information

Massively Parallel Part of Speech Tagging Using. Min-Max Modular Neural Networks.

Massively Parallel Part of Speech Tagging Using. Min-Max Modular Neural Networks. assivey Parae Part of Speech Tagging Using in-ax oduar Neura Networks Bao-Liang Lu y, Qing a z, ichinori Ichikawa y, & Hitoshi Isahara z y Lab. for Brain-Operative Device, Brain Science Institute, RIEN

More information

Collinearity and Coplanarity Constraints for Structure from Motion

Collinearity and Coplanarity Constraints for Structure from Motion Coinearity and Copanarity Constraints for Structure from Motion Gang Liu 1, Reinhard Kette 2, and Bodo Rosenhahn 3 1 Institute of Information Sciences and Technoogy, Massey University, New Zeaand, Department

More information

Fuzzy Perceptual Watermarking For Ownership Verification

Fuzzy Perceptual Watermarking For Ownership Verification Fuzzy Perceptua Watermarking For Ownership Verification Mukesh Motwani 1 and Frederick C. Harris, Jr. 1 1 Computer Science & Engineering Department, University of Nevada, Reno, NV, USA Abstract - An adaptive

More information

Modelling and Performance Evaluation of Router Transparent Web cache Mode

Modelling and Performance Evaluation of Router Transparent Web cache Mode Emad Hassan A-Hemiary IJCSET Juy 2012 Vo 2, Issue 7,1316-1320 Modeing and Performance Evauation of Transparent cache Mode Emad Hassan A-Hemiary Network Engineering Department, Coege of Information Engineering,

More information

CLOUD RADIO ACCESS NETWORK WITH OPTIMIZED BASE-STATION CACHING

CLOUD RADIO ACCESS NETWORK WITH OPTIMIZED BASE-STATION CACHING CLOUD RADIO ACCESS NETWORK WITH OPTIMIZED BASE-STATION CACHING Binbin Dai and Wei Yu Ya-Feng Liu Department of Eectrica and Computer Engineering University of Toronto, Toronto ON, Canada M5S 3G4 Emais:

More information

1682 IEEE TRANSACTIONS ON FUZZY SYSTEMS, VOL. 22, NO. 6, DECEMBER Backward Fuzzy Rule Interpolation

1682 IEEE TRANSACTIONS ON FUZZY SYSTEMS, VOL. 22, NO. 6, DECEMBER Backward Fuzzy Rule Interpolation 1682 IEEE TRANSACTIONS ON FUZZY SYSTEMS, VOL. 22, NO. 6, DECEMBER 2014 Bacward Fuzzy Rue Interpoation Shangzhu Jin, Ren Diao, Chai Que, Senior Member, IEEE, and Qiang Shen Abstract Fuzzy rue interpoation

More information

Multi-level Shape Recognition based on Wavelet-Transform. Modulus Maxima

Multi-level Shape Recognition based on Wavelet-Transform. Modulus Maxima uti-eve Shape Recognition based on Waveet-Transform oduus axima Faouzi Aaya Cheikh, Azhar Quddus and oncef Gabbouj Tampere University of Technoogy (TUT), Signa Processing aboratory, P.O. Box 553, FIN-33101

More information

Deep Fisher Networks for Large-Scale Image Classification

Deep Fisher Networks for Large-Scale Image Classification Deep Fisher Networs for Large-Scae Image Cassification Karen Simonyan Andrea Vedadi Andrew Zisserman Visua Geometry Group, University of Oxford {aren,vedadi,az}@robots.ox.ac.u Abstract As massivey parae

More information

Adaptive 360 VR Video Streaming: Divide and Conquer!

Adaptive 360 VR Video Streaming: Divide and Conquer! Adaptive 360 VR Video Streaming: Divide and Conquer! Mohammad Hosseini *, Viswanathan Swaminathan * University of Iinois at Urbana-Champaign (UIUC) Adobe Research, San Jose, USA Emai: shossen2@iinois.edu,

More information

Deformation-based interactive texture design using energy optimization

Deformation-based interactive texture design using energy optimization Visua Comput (2007) 23: 631 639 DOI 10.1007/s00371-007-0154-3 ORIGINAL ARTICLE Jianbing Shen Xiaogang Jin Xiaoyang Mao Jieqing Feng Deformation-based interactive texture design using energy optimization

More information

Neural Networks. Aarti Singh & Barnabas Poczos. Machine Learning / Apr 24, Slides Courtesy: Tom Mitchell

Neural Networks. Aarti Singh & Barnabas Poczos. Machine Learning / Apr 24, Slides Courtesy: Tom Mitchell Neura Networks Aarti Singh & Barnabas Poczos Machine Learning 10-701/15-781 Apr 24, 2014 Sides Courtesy: Tom Mitche 1 Logis0c Regression Assumes the foowing func1ona form for P(Y X): Logis1c func1on appied

More information

Fastest-Path Computation

Fastest-Path Computation Fastest-Path Computation DONGHUI ZHANG Coege of Computer & Information Science Northeastern University Synonyms fastest route; driving direction Definition In the United states, ony 9.% of the househods

More information

Joint Optimization of Intra- and Inter-Autonomous System Traffic Engineering

Joint Optimization of Intra- and Inter-Autonomous System Traffic Engineering Joint Optimization of Intra- and Inter-Autonomous System Traffic Engineering Kin-Hon Ho, Michae Howarth, Ning Wang, George Pavou and Styianos Georgouas Centre for Communication Systems Research, University

More information

Lecture Notes for Chapter 4 Part III. Introduction to Data Mining

Lecture Notes for Chapter 4 Part III. Introduction to Data Mining Data Mining Cassification: Basic Concepts, Decision Trees, and Mode Evauation Lecture Notes for Chapter 4 Part III Introduction to Data Mining by Tan, Steinbach, Kumar Adapted by Qiang Yang (2010) Tan,Steinbach,

More information

An improved distributed version of Han s method for distributed MPC of canal systems

An improved distributed version of Han s method for distributed MPC of canal systems Deft University of Technoogy Deft Center for Systems and Contro Technica report 10-013 An improved distributed version of Han s method for distributed MPC of cana systems M.D. Doan, T. Keviczky, and B.

More information

FREE-FORM ANISOTROPY: A NEW METHOD FOR CRACK DETECTION ON PAVEMENT SURFACE IMAGES

FREE-FORM ANISOTROPY: A NEW METHOD FOR CRACK DETECTION ON PAVEMENT SURFACE IMAGES FREE-FORM ANISOTROPY: A NEW METHOD FOR CRACK DETECTION ON PAVEMENT SURFACE IMAGES Tien Sy Nguyen, Stéphane Begot, Forent Ducuty, Manue Avia To cite this version: Tien Sy Nguyen, Stéphane Begot, Forent

More information

TIME of Flight (ToF) cameras are active range sensors

TIME of Flight (ToF) cameras are active range sensors 140 IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, VOL. 36, NO. 7, JULY 014 Stereo Time-of-Fight with Constructive Interference Victor Castañeda, Diana Mateus, and Nassir Navab Abstract

More information

Quality of Service Evaluations of Multicast Streaming Protocols *

Quality of Service Evaluations of Multicast Streaming Protocols * Quaity of Service Evauations of Muticast Streaming Protocos Haonan Tan Derek L. Eager Mary. Vernon Hongfei Guo omputer Sciences Department University of Wisconsin-Madison, USA {haonan, vernon, guo}@cs.wisc.edu

More information

Space-Time Trade-offs.

Space-Time Trade-offs. Space-Time Trade-offs. Chethan Kamath 03.07.2017 1 Motivation An important question in the study of computation is how to best use the registers in a CPU. In most cases, the amount of registers avaiabe

More information

Optimization and Application of Support Vector Machine Based on SVM Algorithm Parameters

Optimization and Application of Support Vector Machine Based on SVM Algorithm Parameters Optimization and Appication of Support Vector Machine Based on SVM Agorithm Parameters YAN Hui-feng 1, WANG Wei-feng 1, LIU Jie 2 1 ChongQing University of Posts and Teecom 400065, China 2 Schoo Of Civi

More information

Learning Depth from Single Images with Deep Neural Network Embedding Focal Length

Learning Depth from Single Images with Deep Neural Network Embedding Focal Length Learning Depth from Single Images with Deep Neural Network Embedding Focal Length Lei He, Guanghui Wang (Senior Member, IEEE) and Zhanyi Hu arxiv:1803.10039v1 [cs.cv] 27 Mar 2018 Abstract Learning depth

More information

Fuzzy Equivalence Relation Based Clustering and Its Use to Restructuring Websites Hyperlinks and Web Pages

Fuzzy Equivalence Relation Based Clustering and Its Use to Restructuring Websites Hyperlinks and Web Pages Fuzzy Equivaence Reation Based Custering and Its Use to Restructuring Websites Hyperinks and Web Pages Dimitris K. Kardaras,*, Xenia J. Mamakou, and Bi Karakostas 2 Business Informatics Laboratory, Dept.

More information

Chapter 5 Combinational ATPG

Chapter 5 Combinational ATPG Chapter 5 Combinationa ATPG 2 Outine Introduction to ATPG ATPG for Combinationa Circuits Advanced ATPG Techniques 3 Input and Output of an ATPG ATPG (Automatic Test Pattern Generation) Generate a set of

More information

Proceedings of the International Conference on Systolic Arrays, San Diego, California, U.S.A., May 25-27, 1988 AN EFFICIENT ASYNCHRONOUS MULTIPLIER!

Proceedings of the International Conference on Systolic Arrays, San Diego, California, U.S.A., May 25-27, 1988 AN EFFICIENT ASYNCHRONOUS MULTIPLIER! [1,2] have, in theory, revoutionized cryptography. Unfortunatey, athough offer many advantages over conventiona and authentication), such cock synchronization in this appication due to the arge operand

More information

FACE RECOGNITION WITH HARMONIC DE-LIGHTING. s: {lyqing, sgshan, wgao}jdl.ac.cn

FACE RECOGNITION WITH HARMONIC DE-LIGHTING.  s: {lyqing, sgshan, wgao}jdl.ac.cn FACE RECOGNITION WITH HARMONIC DE-LIGHTING Laiyun Qing 1,, Shiguang Shan, Wen Gao 1, 1 Graduate Schoo, CAS, Beijing, China, 100080 ICT-ISVISION Joint R&D Laboratory for Face Recognition, CAS, Beijing,

More information