Exploring Context with Deep Structured models for Semantic Segmentation

Size: px
Start display at page:

Download "Exploring Context with Deep Structured models for Semantic Segmentation"

Transcription

1 1 Exploring Context with Deep Structure moels for Semantic Segmentation Guosheng Lin, Chunhua Shen, Anton van en Hengel, Ian Rei between an image patch an a large backgroun image region. Explicitly moeling the patch-patch contextual relations has not been well stuie in recent CNN-base segmentation methos. In this work, we propose to explicitly moel the contextual relations using conitional ranom fiels (CRFs). We formulate CNN-base pairwise potential functions to capture semantic correlations between neighboring patches. Some recent methos combine CNNs an CRFs for semantic segmentation, e.g., the ense CRFs applie in [3], [5], [6], [7]. The purpose of applying the ense CRFs in these methos is to refine the upsample lowresolution preiction to sharpen object/region bounaries. These methos consier Potts-moel-base pairwise potentials for enforcing local smoothness. There the pairwise potentials are conventional log-linear functions. In contrast, here we learn more general pairwise potentials using CNNs to moel the semantic compatibility between image regions. Our CNN pairwise potentials aim to improve the coarse-level preiction rather than merely encouraging local smoothness, an thus have a ifferent purpose compare to Potts-moel-base pairwise potentials. Given that these two types of potentials make ifferent effects, they can be combine to improve segmentation results. Fig. 1 illustrates the preiction process of our metho. In contrast to patch-patch context, patch-backgroun context is wiely explore in the literature. For CNN-base methos, backgroun information can be effectively capture by combining features from a multi-scale image network input, an has shown goo performance in some recent segmentation methos [1], [8]. A special case of capturing patch-backgroun context is consiering the whole image as the backgroun region an incorporating the imagelevel label information into learning. In our approach, to encoe rich backgroun information, we construct multiarxiv: v2 [cs.cv] 26 Mar 2016 Abstract State-of-the-art semantic image segmentation methos are mostly base on training eep convolutional neural networks (CNNs). In this work, we proffer to improve semantic segmentation with the use of contextual information. In particular, we explore patch-patch context an patch-backgroun context in eep CNNs. We formulate eep structure moels by combining CNNs an Conitional Ranom Fiels (CRFs) for learning the patch-patch context between image regions. Specifically, we formulate CNN-base pairwise potential functions to capture semantic correlations between neighboring patches. Efficient piecewise training of the propose eep structure moel is then applie in orer to avoi repeate expensive CRF inference uring the course of back propagation. For capturing the patch-backgroun context, we show that a network esign with traitional multi-scale image inputs an sliing pyrami pooling is very effective for improving performance. We perform comprehensive evaluation of the propose metho. We achieve new state-of-the-art performance on a number of challenging semantic segmentation atasets incluing NYUDv2, PASCAL-VOC2012, Cityscapes, PASCAL-Context, SUN-RGBD, SIFT-flow, an KITTI atasets. Particularly, we report an intersection-over-union score of 77.8 on the PASCAL-VOC2012 ataset. Inex Terms Semantic Segmentation, Convolutional Neural Networks, Conitional Ranom Fiels, Contextual Moels 1 INTRODUCTION Semantic image segmentation aims to preict a category label for every image pixel, which is an important yet challenging task for image unerstaning. Recent approaches have applie convolutional neural network (CNNs) [1], [2], [3] to this pixel-level labeling task an achieve remarkable success. Among these CNN-base methos, fully convolutional neural networks (FCNNs) [2], [3] have become a popular choice, because of their computational efficiency for ense preiction an en-to-en style learning. Contextual relationships are ubiquitous an provie important cues for scene unerstaning tasks. Spatial context can be formulate in terms of semantic compatibility relations between one object an its neighboring objects or image patches (stuff), in which a compatibility relation is an inication of the co-occurrence of visual patterns. For example, a car is likely to appear over a roa, an a glass is likely to appear over a table. Context can also encoe incompatibility relations. For example, a boat is unlikely to appear on a roa. These relations also exist at finer scales, for example, in object part-to-part relations, an part-toobject relations. In some cases, contextual information is the most important cue, particularly when a single object shows significant visual ambiguities. A more etaile iscussion of the value of spatial context can be foun in [4]. We explore two types of spatial context to improve the segmentation performance: patch-patch context an patchbackgroun context. The patch-patch context is the semantic relation between the visual patterns of two image patches. Likewise, patch-backgroun context the semantic relation Authors are with School of Computer Science, The University of Aelaie, Australia; an ARC Centre of Excellence for Robotic Vision. {guosheng.lin, chunhua.shen, anton.vanenhengel, ian.rei}@aelaie.eu.au

2 2 Unary potential net: Multi-scale CNN... Pairwise potential net Multi-scale CNN... Coarse-level preiction stage: Preiction refinement stage: CRF Inference Up-sample & Refine Deep structure moel: Contextual eep CRF Low-resolution preiction Final preiction Fig. 1 An illustration of the preiction process of our metho. Both our unary an pairwise potentials are formulate as multi-scale CNNs for capturing semantic relations between image regions. Our metho outputs low-resolution preiction after performing CRF inference, then the preiction is up-sample an refine in a stanar post-processing stage to output the final preiction. scale networks an apply sliing pyrami pooling on feature maps. The traitional pyrami pooling (in a sliing manner) on the feature map is able to capture information from backgroun regions of ifferent sizes. Incorporating general pairwise potentials usually involves computationally expensive inference, which brings challenges for CRF learning. To facilitate efficient learning we apply piecewise training of the CRF [9] to avoi repeate inference uring back propagation training of the eep moel. Thus our main contributions are as follows: 1. We formulate CNN-base general pairwise potential functions in CRFs to explicitly moel patch-patch semantic relations. 2. Deep CNN-base general pairwise potentials are challenging for efficient CNN-CRF joint learning. We perform approximate training, using piecewise training of CRFs [9], to avoi the repeate inference at every stochastic graient escent iteration an thus achieve efficient learning. 3. We explore backgroun context by applying a network architecture with traitional multi-scale image input [1] an sliing pyrami pooling [10]. We empirically emonstrate the effectiveness of this network architecture for semantic segmentation. 4. We set new state-of-the-art performance on a number of challenging semantic segmentation atasets, incluing NYUDv2, PASCAL VOC 2012, PASCAL-Context, SIFT-flow, SUN-RGBD, Cityscapes ataset an so on. In particular, we achieve an intersection-over-union score of 77.8 on the PASCAL VOC 2012 ataset. 1.1 Relate work Preliminary results of our work appeare in [11]. Exploiting contextual information has been wiely stuie in the literature (e.g., [4], [12], [13]). For example, early work of TAS [4] moels ifferent types of spatial context between Things an Stuff using a generative graphical moel. The most successful recent methos for semantic image segmentation are base on CNNs. A number of these CNNbase methos are region-proposal-base methos [14], [15], which first generate region proposals an then assign category labels to each of them. Very recently, FCNNs [2], [3], [7] have become a popular choice for their efficient feature generation an en-to-en training. FCNNs have also been applie to a range of other ense-preiction tasks recently, such as image restoration [16], image superresolution [17] an epth estimation [18], [19], [20]. The metho that we propose here is also built upon fully convolution-style networks. FCNN methos make use of the Image-Net traine CNN moels (e.g., the VGG-16 moel [21]) which takes avantages of the large Image-Net ataset for learning eep moels. For convolution an pooling layers, the resolution of the output feature map is own-sample if the convolution/pooling strie is greater than 1. Usually a few such layers use a strie setting of 2, hence the irect preictions of FCNNs are typically in low resolution. To increase the preiction resolution, the naive metho of irectly reucing the stries for all layers is not able to aress this ownsample preiction for a eep network. Small stries result in prohibitively expensive computation for a eep network, an also reuce the view-of-fiel (the image region that a filter is able to see ) of the network layers. Network layers with insufficient view-of-fiel may not be able to capture high-level semantic patterns an thus egrae the performance. To aress this low-resolution preiction issue, a variety of FCNN base methos are propose very recently which focus on refining the low-resolution preiction to obtain high resolution preiction. DeepLab-CRF [3] performs bilinear upsampling of the preiction score map to the input image size an applies the ense CRF metho [22] to refine the object bounary by levering low-level (color contrast) information. They consier Potts-moel base pairwise potential functions which enforce local smoothness. CRF-RNN [6] extens this approach by implementing the mean fiel CRF inference as recurrent layers for en-to-en learning of the ense CRF an FCNN network. The work in [23] learns econvolution layers to upsample the low-resolution preictions. The epth estimation metho [24] explores super-pixel pooling for builing the gap between the lowresolution feature map an high-resolution final preiction. Eigen et al. [20] perform coarse-to-fine learning of multiple networks with ifferent resolution outputs for refining the coarse preiction. The methos in [2], [25] explore mi-layer features (skip connections) for high-resolution preiction. Unlike these methos, our metho focuses on improving the coarse (low-resolution) preiction by learning general CNN pairwise potentials to capture semantic relations between patches. These methos are complementary to our metho. Jointly learning CNNs an CRFs has also been explore in other applications apart from segmentation. Recent work in [19], [24] proposes to jointly learn continuous CRFs an CNNs for epth estimation from single monocular images. They focus on continuously-value variable preiction,

3 3 FeatMap-Net Feature map (low resolution) Construct CRF graph (accoring to the feature map resolution) 1. Noe construction: each noe correspons to each spatial position in the feature map. CRF graph 2. Pairwise connections: each noe connects to all noes that lie within a pre-efine range box Fig. 2 An illustration of generating a feature map with FeatMap-Net an constructing the CRF graph. Surrouning Above/Below Fig. 3 An illustration of constructing pairwise connections in a CRF graph. A noe is connecte to all other noes which lie insie the range box (ashe box in the figure). Two types of spatial relations are escribe in the figure, which correspon to two types of pairwise potential functions. while our metho is for iscrete categorical label preiction. The work in [26] combines CRFs an CNNs for human pose estimation. The authors of [27] explore joint training of Markov ranom fiels an eep neural networks for the tasks of preicting wors from noisy images an multi-class classification. They require marginal inference for every graient calculation which is computationally expensive for training eep moels. 2 MODELING SEMANTIC PAIRWISE RELATIONS We first escribe how to buil the CRF graph for moeling semantic pairwise relations. Given an image, we first apply a convolutional network to generate a feature map. We refer to this network as FeatMap-Net, etails of which are presente in Sec. 4 (Fig. 6 shows the overall architecture). With this feature map, we construct one noe in the CRF graph corresponing to one spatial position of the feature map. Fig. 2 illustrates how we construct noes an pairwise connections in a CRF graph. Pairwise connections are constructe by connecting one noe to all other noes which lie within a spatial range box (the ashe box in Fig. 3). We consier ifferent spatial relations by efining ifferent types range boxes, an each type of spatial relation is moele by a specific pairwise potential function. As shown in Fig. 3, our metho moels the surrouning an above/below spatial relations. For the surrouning relation, the range box is centere at the noe. For the above/below relation, the bottom ege of the range box is centere at the noe. In our experiments, the size of the range box (ash box in the figure) size is 0.4a 0.4a, where a is the length of the short ege of the feature map. It woul be straightforwar to construct more pairwise potentials, by varying either the sizes or positions of the connection range boxes, an our approach is not limite to connections within boxes. 3 CONTEXTUAL DEEP CRFS Here we present the etails of our eep CRF moel. We enote by x X one input image an y Y the labeling mask which escribes the label configuration of each noe in the CRF graph. The energy function is enote by E(y, x; θ) which moels the compatibility of the input-output pair, with a small output value inicating high confience in the preiction y. All network parameters are enote by θ which we nee to learn. The conitional likelihoo for one image is formulate as follows: P (y x) = 1 exp [ E(y, x)]. (1) Z(x) Here Z is the partition function, efine as: Z(x) = y exp [ E(y, x)]. The energy function is typically formulate by a set of unary an pairwise potentials: E(y, x) = U(y p, x p ) U U p N U + V (y p, y q, x pq ). (2) V V (p,q) S V Here U is a unary potential function. To make the exposition more general, we consier multiple types of unary potentials with U the set of all such unary potentials. N U is a set of noes for the potential U. Likewise, V is a pairwise potential function with V the set of all types of pairwise potential. S V is the set of eges for the potential V. x p an x pq inicates the corresponing image regions which associate to the specifie noe an ege. The potential function is constructe by a eep network for generating feature map (FeatMap-Net) an a shallow network (Unary-Net or Pairwise-Net) to generate the output of the potential function. Details are escribe in the following sections. An overview of our contextual eep structure moel for preiction an training is shown in Fig Unary potential functions We formulate the unary potential function by stacking the FeatMap-Net for generating feature maps an a shallower fully connecte network (referre to as Unary-Net) to generate the final output of the unary potential function. The unary potential function is written as follows: U(y p, x p ; θ U ) = z p,yp (x; θ U ). (3) Here z p,yp is the output value of Unary-Net, which correspons to the p-th noe an the y p -th class. Fig. 4 shows an illustration of the Unary-Net an how it corporates with FeatMap-Net. Fig. 5 emonstrates the process for generating the feature vector for one noe. The input of the Unary-Net is the noe feature vector extracte

4 4 FeatMap-Net Feature map (low resolution) Network parameter learning: Minimise negative log-likelihoo Coarse level preiction: CRF marginal inference Low resolution preiction Construct CRF graph (accoring to the feature map) Generate features Learning stage Preiction stage One noe Noe feature vector Unary-Net One unary potential output: CRF graph Distribution of the preiction: One pairwise connection (ege) Ege feature vector Pairwise-Net One pairiwise potential output: in which, Other noes/eges Fig. 4 An overview of the propose contextual eep structure moel. Unary-Net an Pairwise-Net are shown here for generating potential function outputs. One noe in the CRF graph spatial corresponence in the feature map Feature map Extract the corresponing feature vector for one noe Feature vector for one noe One pairwise connection (ege) in the CRF graph spatial corresponence in the feature map Feature map Extract the feature vectors of two noes Two noe feature vectors Concatenate Feature vector for pairwise connection 2 Fig. 5 An illustration of generating feature vectors for CRF noes an pairwise connections from the feature map output by FeatMap-Net. The symbol enotes the feature imension. We concatenate the corresponing features of two connecte noes in the feature map to obtain the CRF ege features. from the feature map which is generate by FeatMap-Net. The feature vector for one CRF noe is simply the corresponing feature vector in the feature map. The imension of the Unary-Net output vector for one noe is K, which is the same as the number of classes. 3.2 Pairwise potential functions We formulate the unary potential function, analogous to the unary potentials, by stacking the FeatMap-Net for generating feature maps an a shallower fully connecte network (referre to as Pairwise-Net) to generate the final output of the pairwise potential function. The pairwise potential function is written as follows: V (y p, y q, x pq ; θ V ) = z p,q,yp,y q (x; θ V ). (4) Here z p,q,yp,y q is the output value of Pairwise-Net. It is the confience value for the noe pair (p, q) when they are labele with the class value (y p, y q ), which measures the compatibility of the label pair (y p, y q ) given the input image x. θ V is the corresponing set of CNN parameters for the potential V, which we nee to learn. The role of Pairwise- Net in our structure moel is illustrate in Fig. 4. Fig. 5 escribes the process for generating the feature vector for one pairwise connection. The input of Pairwise-Net is the ege feature vector which is generate from the feature map for two connecte noes. Following the work of [28], we concatenate the corresponing feature vectors of two connecte noes to obtain the CRF ege feature vector. The Pairwise-Net has K 2 output units to match the number of possible label combinations for a pair of noes. Our formulation of pairwise potentials is ifferent from the Potts-moel-base smoothness potentials in the existing methos of [3], [6]. The Potts-moel-base pairwise potentials are a log-linear functions an employ a special formulation for enforcing neighborhoo smoothness base on color contrast, an thus to sharpen object/region bounaries. In contrast, our pairwise potentials moel the semantic compatibility relations between two noes with the output for every possible value of the label pair (y p, y q ) iniviually parameterize by CNNs. Clearly, these two types of pairwise potential formulations have ifferent purposes an effects. Most recent segmentation methos, e.g., the work in [3], [5], [6], [7], have applie the ense CRF metho [22] in the preiction refinement stage for refining (sharpen object bounaries) the coarse (low-resolution) preiction. The

5 5 FeatMap-Net: Image pyrami scale level 1 Conv Block 1-5 (share across scales) Conv Block 6 (for 1 scale only) Conv Block 6 (for 1 scale only) Multi-scale feature maps Sliing pyrami pooling Sliing pyrami pooling 3 3 up-sample concatenate Final feature map 9 scale level 2 Conv Block 6 (for 1 scale only) Sliing pyrami pooling 3 scale level 3 Fig. 6 The etails of our FeatMap-Net. An input image is first resize into 3 scales, then each resize image goes through 6 convolution blocks to output one feature map. Top 5 convolution blocks are share for all scales. Every scale has a specific convolution block (Conv Block 6). We perform 2-level sliing pyrami pooling an concatenate the poole feature map to the original feature map. The symbol enotes the feature imension. ense CRF metho is a Potts-moel-base fully-connecte CRF with pairwise potentials base on color contrast for local smoothness. It is important to clarify that, this smoothness CRFs an our contextual eep CRFs are working in ifferent preiction stages. Our contextual CNN pairwise potentials are applie in the coarse preiction stage to improve the lower-resolution preiction, rather than applying in the bounary refinement stage. In our framework, after obtaining the coarse level preiction, we still nee to perform a refinement step to obtain the final high-resolution preiction (as shown in Fig. 1). Hence we also apply the ense CRF metho [22], as in many other recent methos, in the preiction refinement step. Therefore, our metho takes avantage of both contextual CNN potentials an the traitional smoothness potentials to improve the final result. More etails for preiction can be foun in Sec Asymmetric pairwise potentials As in [29], [30], moeling asymmetric relations requires learning asymmetric potential functions, the output of which shoul epen on the input orer of a pair of noes. In other wors, the potential function is require to be capable of moeling ifferent input orers. Typically we have the following case for asymmetric relations: V (y p, y q, x pq ) V (y q, y p, x qp ). (5) Ieally, the potential V is learne from the training ata. Here we iscuss the asymmetric relation above/below as an example. We take avantage of the input pair orer to inicate the spatial configuration of two noes, thus the input (y p, y q, x pq ) inicates the configuration that the noe p is spatially lies above the noe q. Clearly, the potential function is require to moel ifferent input orers. The asymmetric property is reaily achieve with our general formulation of pairwise potentials. The ege features for the noe pair (p, q) are generate from a concatenation of the corresponing features of noes p an q (as in [28]), in that orer. The potential output for every possible pairwise label combination for (p, q) is iniviually Feature map Sliing pooling winow size: 5x5 Sliing pooling winow size: 9x9 Poole feature map Poole feature map 3 Concatenate feature map Fig. 7 Details for sliing pyrami pooling. We perform 2-level sliing pyrami pooling on the feature map for capturing patchbackgroun context, which encoe rich backgroun information an increase the fiel-of-view for the feature map. parameterize by the pairwise CNNs. These factors ensure that the ege response is orer epenent, easily satisfying the asymmetric requirement. 4 EXPLORING BACKGROUND CONTEXT WITH FEATMAP-NET We evelop multi-scale CNNs an sliing pyrami pooling in our FeatMap-Net to encoe rich backgroun information for capturing patch-backgroun context. Fig. 6 shows the architecture of FeatMap-Net. Details are presente shortly in the squeal. Applying CNNs on multi-scale images has shown improve performance in some recent segmentation methos, e.g., [1], [8]. In our multi-scale network, an input image is first resize into 3 scales, then each resize image goes through 6 convolution blocks to output one feature map. In our experiment, the 3 scales for the input image are set to 1.2, 0.8 an 0.4. All scales share the same top 5 convolution blocks. In aition, each scale has an exclusive convolution block ( Conv Block 6 in the figure) which captures scaleepenent information. The resulting 3 feature maps (corresponing to 3 scales) are of ifferent resolutions, therefore

6 6 we upscale the two smaller ones to the size of the largest feature map using bilinear interpolation. These feature maps are then concatenate to form one feature map. We perform spatial pyrami pooling [10] (a moifie version using sliing winows) on the feature map to capture information from backgroun regions in multiple sizes. From another point of view, this increases the fiel-of-view for the feature map, for which feature vectors are able to encoe information from a larger image region. Increasing the fiel-of-view generally helps to improve performance, which is also iscusse in [3]. The etails of spatial pyrami pooling are illustrate in Fig. 7. In our experiment, we perform 2-level pooling for each image scale. We efine 5 5 an 9 9 sliing pooling winows with max-pooling to generate 2 sets of poole feature maps. These poole feature maps are then concatenate to the original feature map to construct the final feature map, an thus the resulting feature imension is for one image scale. 5 NETWORK CONFIGURATIONS We show the etaile network layer configuration for all networks in Fig. 8. For FeatMap-Net, the configuration of the convolution blocks is similar to the VGG-16 moel [21]. The top 5 convolution blocks share the same configuration as the VGG-16 network. The first fully-connecte layer in VGG-16 is converte into a convolution layer ( see FCN in [2] for etails) an merge into the 5-th convolution block. We only transfer the first fully-connecte (FC) layer into our network rather than 2 FC layers. Note that transferring 2 FC layers is commonly applie in almost all recent FCN base methos [2], [3], [6]. The FC layer in the VGG-16 moel contains a large number of filters (4096), thus our network which transfers only one FC layer is more efficient. In FeatMap-Net, we a a new convolution block ( Conv Block 6 in the figure) which contains 2 convolution layers. This extra convolution block is not existe in the VGG-16 network. With this new convolution block, we are able to capture scale-epenent information an increase the abstraction level. We also have the consieration of increasing the fiel-of-view for the final feature map by aing this extra block. As iscusse in Sec. 1.1, The strie setting of the convolution an pooling layers will result in a feature map which has a smaller resolution than the input image. For the convolution an pooling layers, the resolution of the output feature map is own-sample if the strie is greater than 1. Note that there are a number of convolution/pooling layers in VGG-16 moel which use the strie setting of 2. Therefore, for the original VGG-16 moel, the resolution of the output feature map is 32 times smaller than the size of the input image (see FCN [2] for etails). To increase the resolution of the feature map, almost all recent VGG- 16 base methos [2], [3], [6] reuce the strie of the last two pooling layers to 1, which reuces the own-sampling factor from 32 to 8. In our setting, we reuce the strie of the last max pooling layer (only one layer) in the VGG-16 network to 1, instea of reucing for two pooling layers in many other methos [3], [6]. The resolution of the resulting feature map Conv block 1: 3 x 3 conv 64 3 x 3 conv 64 2 x 2 pooling Conv block 4: 2 x 2 pooling Unary-Net 2 fully-connecte layers: Fully-con 512 Fully-con K FeatMap-Net Conv block 2: 3 x 3 conv x 3 conv x 2 pooling Conv block 5: 2 x 2 pooling 7 x 7 conv 4096 Conv block 3: 3 x 3 conv x 3 conv x 3 conv x 2 pooling Conv block 6: Pairwise-Net 2 fully-connecte layers: Fully-con 512 Fully-con K 2 Fig. 8 The etaile configuration of the networks: FeatMap-Net, Unary-Net an Pairwise-Net. K is the number of classes. The filter size for convolution an the number of filters are shown for all layers. For FeatMap-Net, the top 5 convolution blocks share the same configuration as the convolution blocks in the VGG-16 network. The strie of the last max pooling layer is 1, an for the other max pooling layers we use the same strie setting as the VGG-16 network. is 16 times smaller than the size of the input image. For a input image, the resolution of the resulting feature map is aroun Directly changing the strie inevitably egraes the performance of the learne filters since the fiel-of-view of the input feature map for some filters is change. To preserve the fiel-of-view, recent work has propose a number of approaches. For example, a straightforwar approach is to increase the receptive fiel size of the filter (e.g., ouble the filter size). Large filter sizes will significantly increase the computation cost for convolution operations. This approach also brings the problem of how to upsample the filter weights. Probably a better approach is to apply the hole algorithm as in [3], which performs a skipping (sample) ot-prouct calculation for filter convolution. Therefore, a large convolution winow size can be applie without increasing the computation cost. Different from existing approaches, here we apply a simple yet effective approach. We a extra two 3 3 convolution layers ( Conv Block 6 ) instea of increasing the filter size. These extra layers are able to enlarge the fiel-of-view an compensate the sie-effect of reucing the strie in pooling layers. 6 PREDICTION At the preiction stage, our eep structure moel generates low-resolution preiction (as shown in Fig. 1), which is 1/16 of the input image size. As iscusse in Sec. 5, this is ue to the strie setting of pooling layers. Therefore, we apply two preiction stages for obtaining the final high-resolution preiction: the coarse-level preiction stage an the preiction refinement stage. We first perform CRF inference on our contextual structure moel to generate a score map for coarse-level preiction, then we bilinearly

7 7 1. Coarse-level preiction stage: Score maps in low-resolution eep contextual moel CRF inference 2. Preiction refinement stage: Final preiction Sharpen bounary Score maps bilinearly upsample Fig. 9 An illustration of our two-stage preiction process. The preiction process consists of two stages: the coarse-level preiction stage an the preiction refinement stage. We first perform CRF inference on our contextual moel to generate a score map for coarse-level preiction, then we bilinearly unsmaple the score map an apply a bounary refinement metho [22] to obtain the final preiction which has the same resolution as the input image. unsmaple the score map an apply a bounary refinement metho [22] to obtain the final preiction which has the same resolution as the input image. This two-stage preiction process is illustrate in Fig Coarse-level preiction stage We perform CRF inference on our contextual structure moel to obtain the coarse preiction of a test image. For example, we can solve the maximum a posteriori (MAP) problem: y = argmax y P (y x). Alternatively, we also can consier the marginal inference over noes for preiction: p N : P (y p x) = y\y p P (y x). (6) We obtain the marginal istribution for each noe after performing this marginal inference. This marginal istribution can be further applie in the next preiction stage for bounary refinement. Details are shown in the next section. Our CRF graph oes not form a tree structure, nor are the potentials submoular, hence we nee to an apply approximate inference. To aress this we apply an efficient message passing algorithm which is base on the mean fiel approximation [31]. The mean fiel algorithm constructs a simpler istribution Q(y), e.g., a prouct of inepenent marginals: Q(y) = p N Q p(y p ), which minimizes the KLivergence between the istribution Q(y) an P (y). In our experiments, we perform 3 mean fiel iterations. 6.2 Preiction refinement stage We generate the score map for the coarse preiction from the marginal istribution which we obtain from the mean-fiel inference. We first bilinearly up-sample the score map of the coarse preiction to the size of the input image. Then we apply a common post-processing metho [22] (ense CRF) to sharpen the object bounary for generating the final highresolution preiction. This post-processing metho leverages low-level pixel intensity information (color contrast) for bounary refinement. Note that most recent work on image segmentation prouce low-resolution preiction an have a upsampling an refinement process/moel for the final preiction, e.g., [3], [6], [7]. In summary, we simply perform bilinear upsampling of the coarse score map an apply the bounary refinement post-processing. We argue that this stage can be further improve by applying more sophisticate refinement methos, e.g., training econvolution networks [23] training multiple coarse to fine learning networks [20], an exploring mile layer features for high-resolution preiction [2], [25]. It is expecte that applying better refinement approaches will gain further performance improvement. In the experiment part, we show an example of exploring the feature maps from mile layers to refine the coarse preiction. We apply this improve refinement approach on the ataset PASCAL VOC Refer to Sec. 9.2 for etails. 7 CRF TRAINING A common approach to CRF learning is to maximize the likelihoo, or equivalently minimize the negative loglikelihoo, which can be written for one image as: log P (y x; θ) = E(y, x; θ) + log Z(x; θ). (7) Aing regularization to the CNN parameter θ, the optimization problem for CRF learning is: min θ λ N 2 θ 2 2 log P (y (i) x (i) ; θ). (8) i=1 Here x (i), y (i) enote the i-th training image an its segmentation mask; N is the number of training images; λ is the weight ecay parameter. Substituting (7) into (8) yiels: min θ λ N 2 θ i=1 [ ] E(y (i), x (i) ; θ) + log Z(x (i) ; θ). (9) We can apply stochastic graient (SGD) base methos to optimize the above problem for learning θ. The energy function E(y, x; θ) is constructe from CNNs, an its graient θ E(y, x; θ) easily compute by applying the chain rule as in conventional CNNs. However, the partition function Z brings ifficulties for optimization. Its graient is written: θ log Z(x; θ) = θ log exp [ E(y, x)] y = exp [ E(y, x; θ)] y y exp [ E(y, x; θ)] θ[ E(y, x; θ)] = E y P (y x;θ) θe(y, x; θ) (10) Generally the size of the output space Y is exponential in the number of noes, which prohibits the irect calculation of Z an its graient. The CRF graph we consiere for segmentation here is a loopy graph (not tree-structure), in which a large number of noes (more than 1000) an pairwise connections (more than ) are involve for one image. For loopy graph with large number of noes an eges, typically approximation is require for inference, an even this is generally computationally expensive.

8 8 More importantly, usually a large number of SGD iterations are require for training CNNs. Typically the number of iterations is in tens or hunres of thousans. Thus performing inference at each SGD iteration is very computationally expensive. 7.1 Piecewise training of CRFs Instea of irectly solving the optimization in (9), we propose to apply an approximate CRF learning metho. In the literature, there are two popular types of learning methos which approximate the CRF objective: pseuo-likelihoo learning [32] an piecewise learning [9]. The main avantage of these methos in term of training eep CRF is that they o not involve marginal inference for graient calculation, which significantly improves the efficiency of training. Decision tree fiels [33] an regression tree fiels [34] are base on pseuo-likelihoo learning, while piecewise learning has been applie in the work [9], [28]. Here we evelop this iea for the case of training the CRF with the CNN potentials. In piecewise training, the conitional likelihoo is formulate as a number of inepenent likelihoos efine on potentials, written as: P (y x) = P U (y p x) P V (y p, y q x). U U p N U V V (p,q) S V The likelihoo P U (y p x) is constructe from the unary potential U. Likewise, P V (y p, y q x) is constructe from the pairwise potential V. P U an P V are written as: P U (y p x) = exp [ U(y p, x p )] y p exp [ U(y p, x p )], (11) P V (y p, y q x) = exp [ V (y p, y q, x pq )] y p,y q exp [ V (y p, y q, x pq )]. (12) The log-likelihoo for piecewise training is then: log P (y x) = log P U (y p x) U U p N U + log P V (y p, y q x). (13) V V (p,q) S V The optimization problem for piecewise training is to minimize the negative log likelihoo with regularization: min θ λ N 2 θ V V i=1 [ U U (p,q) S (i) V p N (i) U log P U (y p x (i) ; θ U ) ] log P V (y p, y q x (i) ; θ V ). (14) Compare to the objective in (9) for irect maximum likelihoo learning, the above objective oes not involve the global partition function Z(x; θ). To calculate the graient of the above objective, we only nee to calculate the graient θu log P U an θv log P V. With the efinition in (11), P U is a conventional softmax normalization function over only K (the number of classes) elements. Similar analysis can also be applie to P V. Hence, we can easily calculate the graient without involving expensive inference. Moreover, we are able to perform parallele training of potential functions, since the above objective is formulate by a summation of inepenent log-likelihoos. As previously iscusse, CNN training usually involves a large number of graient upate iteration which prohibit the repeate expensive inference. Our piecewise approach here provies a practical solution for learning CRFs with CNN potentials on large-scale ata. 8 IMPLEMENTATION DETAILS For the FeatMap-Net, the first 5 convolution blocks an the first convolution layer in the 6th convolution block are initialize from the VGG-16 network [21]. All remaining layers are ranomly initialize. Note that VGG-16 network is wiely applie in recent segmentation methos. All layers are traine using back-propagation/stochastic graient escen (SGD). We apply simple ata augmentation in the training stage. Specifically, we perform ranom scaling (from 0.7 to 1.2) an flipping of the images for training. As illustrate in Fig. 3, we use 2 types of pairwise potential functions. In total, we have 1 type of unary potential function an 2 types of pairwise potential functions. We formulate one specific FeatMap-Net an potential network (Unary-Net or Pairwise-Net) for one type of potential function. In other wors, one type of potential function is constructe by one FeatMap-Net an a shallow potential network. More etails of FeatMap-Net an Unary/Pairwise- Net can be foun in Fig. 4. There are two main benefits of moeling specific FeatMap-Net for each potential instea of sharing one FeatMap-Net across potentials. Different types of potentials have ifferent focus an thus probably require separate feature maps. Using separate FeatMap-Net allows generating specific high-level features for the corresponing potential function. Moreover, with separate FeatMap-Net, we are able to parallel the training of ifferent types of potentials, an thus ease the implementation an spee up the training. 8.1 Efficient learning As previously iscusse, one noe in the CRF graph is connecte to all noes which lie within a preefine range box. Uner this setting, the number of pairwise connections for one noe can be a few hunre. For example, for an input image with a resolution of pixels an 1.2x scaling, the resolution of the feature map after going through FeatMap-Net is aroun In this case, the number of noes are aroun 1200, an the number of connections is aroun 200 for one noe. Hence for this image, we nee to process pairwise relations for generating the ege features, passing forwar the Pairwise- Net an back-propagating the graients (in the training stage). These operations are consierably computationally expensive for such a large number of pairwise connections. If using high feature imension for the feature map, these operations can even run out of the GPU memory. In our solution, to spee up the training an testing of the pairwise potentials, we perform sampling of pairwise connections for each noe in the CRF graph. Since the feature map encoes reunant information in local regions, performing sampling can still preserve sufficient pairwise

9 9 relations while removing reunancies. Specifically, we sample 24 neighboring noes base on a regular 5 5 gri spanning the range box (excluing self-connection), an thus we have 24 pairwise connections for each noe, which is an orer of magnitue fewer connections than the original setting. We observe that this sampling setting, which reuces the number of pairwise connections significantly, spees up the training without egraing the performance. 8.2 Asynchronous graient upate The number of pairwise connections is still large even with sampling, which brings the problem of keeping the ege features in the GPU memory. Moreover, consiering a large number of pairwise connections (more than ) in one iteration for upating the parameters of Pairwise-Net might result in egrae graients. This is similar to the case that using an extremely large batch size for graient calculation in the training of a conventional classification network. An extremely large batch size for graient upate can significantly slow own the convergence an may ecrease the performance [35]. Moreover, as iscusse in [36], using small batch size may perform noise injection in the graient calculation as is a form of regularization, which may lea to better parameter solutions. Overall, from both empirical observations an theoretical analysis, a appropriate setting of batch size is key to the network training. Therefore, we probably shoul not consier all pairwise connections in one graient iteration for upating the Pairwise-Net. To reuce the GPU memory consumption an improve the batch upate for the Pairwise-Net, we perform asynchronous graient upate for training the FeatMap-Net an Pairwise-Net. With asynchronous graient upate, the graient calculations for ifferent parts of the network are not require in the same iteration, which breaks the epenency between ifferent parts (or layers) of the networks. Asynchronous graient upate is wiely applie in largescale istribute network learning. For etails one may refer to [37]. Specifically, in one stochastic graient iteration, we perform multiple sub-iterations of graient upate for the Pairwise-Net an collect the graients for the FeatMap-Net. In each sub-iteration, a subset of pairwise connections is selecte (e.g., 2000) for graient calculation an the parameters of Pairwise-Net are upate. Clearly, in each subiteration we only process a small number of connections for upating the network parameters of Pairwise-Net, thus GPU consumption is low an the batch size for learning Pairwise-Net is reuce. This asynchronous approach aresses the problems of GPU memory an large batch size for training Pairwise-Net. After going through all pairwise connections, we collect the graients for FeatMap-Net, an perform a conventional back-propagation graient upate to FeatMap-Net. 9 E XPERIMENTS We evaluate our metho on 8 challenging semantic segmentation atasets: PASCAL VOC 2012, NYUDv2, PASCAL-Context, SIFT-flow, SUN-RGBD, KITTY, COCO an Cityscapes, which covers various types of scene images, (a) Testing (b) Groun Truth (c) Preiction Fig. 10 Preiction examples of our metho on NYUDv2 ataset. TABLE 1 Segmentation results on NYUDv2 ataset (40 classes). We compare to a number of recent methos. Our metho significantly outperforms the existing methos. metho Gupta et al. [38] FCN-32s [2] FCN-HHA [2] ours-unary ours ours+ training ata RGB-D RGB RGB-D RGB RGB RGB pixel accuracy mean accuracy IoU incluing inoor/outoor scene, street scene, etc. Our comprehensive experiments show that the propose metho achieves new state-or-the-art performance on these atasets. Our system is built on MatConvNet [40]. The segmentation performance is measure by the intersection-over-union (IoU) score [41], the pixel accuracy an the mean accuracy on categories [2]. We enote cij as an element in the confusion matrix, which is the number of pixels with the i-th category as the groun truth an the j th category as the preiction; ti is the total number of pixels for the i-th category in the groun truth; K is the number of categories. Pixel accuracy measures the portion of correctly P i cii P. Mean accuracy measures the perpreicte pixels: i ti 1 P cii category pixel accuracy: K i ti. IoU score calculates the portion of the intersection between the groun truth an the 1 P P cii preiction: K. i ti + cij cii j 9.1 Results on the NYUDv2 ataset We first evaluate our metho on the NYUDv2 [42] ataset which has 1449 RGB-D inoor scene images. We use the segmentation labels provie in [43] for which the labels are processe into 40 classes. We use the stanar training set which contains 795 images an the test set which contains 654 images. We train our moels only on RGB images without using the epth information.

10 10 (a) Testing (b) Groun Truth (c) Preiction () Testing (e) Groun Truth (f) Preiction (g) Testing (h) Groun Truth (i) Preict Fig. 11 Some preiction examples of our metho on PASCAL VOC 2012 ataset. TABLE 2 Ablation Experiments. The table shows the value ae by the ifferent system components of our metho on the NYUDv2 ataset (40 classes). metho FCN-32s [2] FullyConvNet Baseline + sliing pyrami pooling + multi-scales + bounary refinement + CNN contextual pairwise pixel accuracy mean accuracy IoU Results are shown in Table 1. Some preiction examples are shown in Fig. 10 Unless otherwise specifie, our moels are initialize using the VGG-16 network. VGG-16 is also use in most competing methos such as FCN [2]. Our unary-only moel alreay outperforms the competing methos even though it makes no use of the extra epth information. This verifies the effectiveness of the multiscale network architecture an sliing pyrami pooling in our metho which captures rich backgroun contextual information. Applying our full contextual moel with CNN pairwise potentials, we attain significant further improvement over the unary-only moel, which sets new state-ofthe-art performance on the NYUDv2 ataset. Our contextual CNN pairwise potentials bring significant improvement (2.5 percentage point) over the unary-only moel, which clearly verifies its effectiveness. To further explore the potential of our metho, we pretrain our moel on the Microsoft COCO ataset [44] with labels in 80 classes, an we initialize from this moel an train on the NYUD images. This result is enote as ours+ in the result table. Our moel gets further improve with the COCO image pre-training. On this ataset, we have improve the segmentation accuracy by a large margin even without using the epth information Component evaluation We evaluate the performance contribution of ifferent components of the FeatMap-Net for capture patch-backgroun context on the NYUDv2 ataset. We present the results of aing ifferent components in FeatMap-Net, which are shown in Table 2. We start from a baseline setting of our FeatMap-Net ( FullyConvNet Baseline in the result table), for which multi-scale an sliing pooling is remove. This baseline setting is the conventional fully convolution network for segmentation, which can be consiere as our implementation of the FCN metho in [2]. The result shows that our CNN baseline implementation ( FullyConvNet ) achieves very similar performance (slightly better) than the FCN metho. Clearly, our multi-scale network esign an sliing pyrami pooling significantly improve the performance, which shows the effectiveness of the propose approach for encoing rich backgroun context. Applying the ense CRF metho [22] for bounary refinement gains further improvement. Finally, aing our contextual CNN pairwise potentials, we achieve very significant further improvement. 9.2 Results on the PASCAL VOC 2012 ataset PASCAL VOC 2012 [41] is a well-known segmentation evaluation ataset which consists of 20 object categories an one backgroun category. This ataset is split into a training set, a valiation set an a test set, which respectively contain 1464, 1449 an 1456 images. Following a conventional setting in [3], [15], the training set is augmente by extra annotate VOC images provie in [45], which results in training images. We verify our performance on the PASCAL VOC 2012 test set. We compare with a number of recent methos with competitive performance. Since the groun truth labels are not available for the test set, we evaluate our metho through the VOC evaluation server.

11 TABLE 3 Iniviual category results on the PASCAL VOC 2012 test set (IoU scores). Our metho performs the best 11 aero bike bir boat bottle bus car cat chair cow metho mean Only using VOC training ata FCN-8s [2] Zoom-out [8] DeepLab [3] CRF-RNN [6] DeconvNet [23] DPN [39] ours Using VOC+COCO training ata DeepLab [3] CRF-RNN [6] BoxSup [7] DPN [39] ours table og horse mbike person potte sheep sofa train tv The IoU scores are shown in the last column of Table 3. Preiction examples of our metho are shown in Fig. 11. We first train our moel only using the VOC images. We achieve an IoU score of 75.3, which is the best result amongst methos that only use the VOC training ata. 1 To improve the performance, following the setting in recent work [3], [7], we train our moel with the extra images from the COCO ataset [44]. With these extra training images, we achieve an IoU score of As escribe in Sec. 6, our eep structure moel generates low-resolution coarse preiction, which is 1/16 of the input image size. To obtain the final high-resolution preiction we apply a simple yet effective approach: we first perform bilinear upsampling of the coarse score map an then apply the bounary refinement post-processing [22]. To improving this simple approach, we exploit the feature maps from mile layers to refine the coarse preiction an prouce high-resolution preiction, which is similar to the methos in [2], [3], [25]. With this improve refinement approach, we finally achieve an IoU score of 77.8, which is best reporte result on this challenging ataset. 2 The feature maps from the mile layers encoe lower level visual information (from ege patterns to texture/object part patterns) an have higher resolution than the final output, thus it is expecte that learning extra layers on these feature maps helps to preict etails in the object bounaries. Specifically, we a refinement layers on top of the feature maps from the first 5 max-pooling layers an the score map of the coarse preiction (output by our eep structure moel). Details are shown in Fig. 12. These refinement layers play a role of refining the coarse preiction by exploring mile layer features, which increase the resolution of the preiction from 1/16 (coarse preiction) to 1/2 of the input image. With this improve preiction, we perform bounary refinement using [22] to generate the final preiction. The results for each category are shown in Table 3. We outperform comparing methos in most categories. For only using the VOC training set, our metho outperforms the secon best metho, DPN [39], on 18 categories out of 20. For using VOC+COCO training set, our metho outperforms DPN [39] on 15 categories out of The result link at the VOC evaluation server: ox.ac.uk:8080/anonymous/keffm4.html 2. The result link at the VOC evaluation server: ox.ac.uk:8080/anonymous/mvtntx.html Input Image 1/1 size Input score map (coarse preiction) 1/16 size Feature maps from top 5 max-pooling layers 1/2 size 1/4 size Apply 2 Conv layers for each feature map upsample an concatenate 1/2 size 3 Conv layers Refine score map 1/2 size Fig. 12 The illustration of exploiting the feature maps from mile layers to refine the low-resolution (1/16 of the input image) coarse preiction. The refine preiction has a resolution of 1/2 of the input image. TABLE 4 Segmentation results on the val set of Cityscapes ataset (19 classes). Our metho significantly outperforms the fully convolution network baseline ( FullyConvNet ). metho pixel accuracy mean accuracy IoU FullyConvNet FullyConvNet + refine ours Results on the Cityscapes ataset The large scale outoor image ataset Cityscapes 3 [46] contains street scene images from 50 ifferent cities. This ataset provies pixel-level semantic segmentation labels of 5000 images for 25 classes incluing roa, car, peestrian, bicycle, sky etc. The provie trainval set has 3475 image. We use the training set (2975 images) for training. Since the groun truth of the test set is not available, we evaluate on the val set which contains 500 images. Following the ataset evaluation protocol, 19 classes are vali for evaluation, the rest 6 classes are treate as voi which are not consiere in evaluation. Results are shown in Table 4. Preiction examples are shown in Fig. 13. Since this is a newly release ataset, there is no publishe results for comparison. Therefore, we present two baseline results for comparison. The first baseline ( FullyConvNet in the table) is our implementation of the conventional fully convolution network for segmentation. More etails of this baseline metho can be foun in Sec The secon baseline comparison metho ( FullyConvNet + refine in the table) applies the ense CRF [22] on top of FullyConvNet 3. Avaiable at:

12 12 (a) Testing (b) Groun Truth () Testing (c) Preiction (e) Groun Truth (f) Preiction Fig. 13 Preiction examples of our metho on Cityscapes ataset. TABLE 5 Segmentation results on PASCAL-Context ataset (60 classes). Our metho performs the best. metho O2P [47] CFM [48] FCN-8s [2] BoxSup [7] ours pixel accuracy mean accuracy IoU for bounary refinement. Clearly, our metho significantly outperforms these baseline methos. 9.4 metho Liu et al. [51] Ren et al. [52] Kenall et al. [53] ours training ata RGB-D RGB-D RGB RGB pixel accuracy mean accuracy IoU Results are shown in Table 6. Our metho outperform the existing methos by a large margin, even though we oes not make use of the epth information for training. Results on the PASCAL-Context ataset The PASCAL-Context [49] ataset provies the segmentation labels of the whole scene (incluing the stuff labels) for the PASCAL VOC images. We use the segmentation labels which contain 60 classes (59 classes plus the backgroun class ) for evaluation. We use the provie training/test splits. The training set contains 4998 images an the test set contains 5105 images. Results are shown in Table 5. Preiction examples are shown in Fig. 14 Our metho significantly outperform the competing methos. To our knowlege, ours is the best reporte result on this ataset. 9.5 TABLE 6 Segmentation results on SUN-RGBD ataset (37 classes). We compare to a number of recent methos. Our metho significantly outperforms the existing methos. Results on the SUN-RGBD ataset SUN-RGBD [50] is a segmentation ataset contains aroun 10, 000 inoor images an provies pixel labeling masks of 37 classes, which is an extension of the NYUD ataset [42]. 9.6 Results on the COCO ataset The COCO ataset [44] contains more than 1 million images an provie segmentation labels for 80 classes. Since the test set is not available, we generate 2599 images for testing an the remaining images are for training. We select the test images on a class balance basis which ensures every category at lease appears in 50 images. Labeling regions which are smaller than 200 pixels are treate as voi which are not consiere in training an evaluation. Results are shown in Table 7. We compare to two baseline methos which are base on conventional fully convolution networks. The etails of these baseline methos are the same as that for the Cityscapes ataset (see Sec. 9.3). The results shows that our metho significantly outperforms the baselines.

13 TABLE 8 Segmentation results on SIFT-flow ataset (33 classes). Our metho performs the best. metho pixel accuracy mean accuracy IoU Liu et al. [51] Tighe et al. [54] Tighe et al. (MRF) [54] Farabet et al. (balance) [1] Farabet et al. [1] Pinheiro et al. [55] FCN-16s [2] ours TABLE 9 Segmentation results on KITTI ataset (10 classes). We compare to a number of recent methos. Our metho significantly outperforms the existing methos. metho pixel accuracy mean accuracy IoU Caena et al. [58] Zhang et al. [57] ours ours (a) Testing (b) Groun Truth (c) Preiction Fig. 14 Preiction examples on PASCAL-Context ataset. TABLE 7 Segmentation results on COCO ataset (80 classes). Our metho significantly outperforms the fully convolution network ( FullyConvNet ). metho pixel accuracy mean accuracy IoU FullyConvNet FullyConvNet + refine ours Results on the SIFT-flow ataset We further evaluate our metho on the SIFT-flow ataset. This ataset contains 2688 images an provies the segmentation labels for 33 classes. We use the stanar split for training an evaluation. The training set has 2488 images an the test set has 200 images. Since the images are in small sizes, we upscale the image by a factor of 2 for training. Results are shown in Table 8. We achieve the best performance on this ataset. 9.8 Results on the KITTI ataset We perform further evaluation on the KITTI ataset [56] for roa image segmentation. Zhang at el. [57] provie semantic segmentation labels of 10 classes for 252 images, in which 140 images are for training an the remaining 112 are for testing. We follow the provie training an testing splits for evaluation an report the results in Table 9. Clearly, our metho performs the best. To further improve the performance, we perform pre-training on COCO images, for which the result is enote by ours+ in the result table. 10 CONCLUSIONS We have propose a metho which combines CNNs an CRFs to exploit complex contextual information for semantic image segmentation. Basically, we formulate CNN base pairwise potentials for moeling semantic relations between image regions. We have performe comprehensive experiments on 8 challenging segmentation atasets an we achieve best performance on all evaluate ataset incluing the PASCAL VOC 2012 ataset. The propose metho is potentially wiely applicable to other tasks. ACKNOWLEDGMENTS This research was supporte by the Data to Decisions Cooperative Research Centre an by the Australian Research Council through the ARC Centre for Robotic Vision (CE ). C. Shen s participation was in part supporte by an ARC Future Fellowship (FT ). I. Rei s participation was in part supporte by an ARC Laureate Fellowship (FL ). C. Shen is the corresponing author. REFERENCES [1] C. Farabet, C. Couprie, L. Najman, an Y. LeCun, Learning hierarchical features for scene labeling, IEEE T. Pattern Analysis & Machine Intelligence, [2] J. Long, E. Shelhamer, an T. Darrell, Fully convolutional networks for semantic segmentation, in Proc. IEEE Conf. Comp. Vis. Pattern Recogn., [3] L. Chen, G. Papanreou, I. Kokkinos, K. Murphy, an A. L. Yuille, Semantic image segmentation with eep convolutional nets an fully connecte CRFs, in Proc. Int. Conf. Learning Representations, [4] G. Heitz an D. Koller, Learning spatial context: Using stuff to fin things, in Proc. European Conf. Computer Vision, [5] A. G. Schwing an R. Urtasun, Fully connecte eep structure networks, 2015, [6] S. Zheng, S. Jayasumana, B. Romera-Parees, V. Vineet, Z. Su, D. Du, C. Huang, an P. Torr, Conitional ranom fiels as recurrent neural networks, in Proc. Int. Conf. Comp. Vis., [7] J. Dai, K. He, an J. Sun, BoxSup: Exploiting bouning boxes to supervise convolutional networks for semantic segmentation, in Proc. Int. Conf. Comp. Vis., [8] M. Mostajabi, P. Yaollahpour, an G. Shakhnarovich, Feeforwar semantic segmentation with zoom-out features, in Proc. IEEE Conf. Comp. Vis. Pattern Recogn., [9] C. A. Sutton an A. McCallum, Piecewise training for unirecte moels, in Proc. Conf. Uncertainty Artificial Intelli, 2005.

Exploring Context with Deep Structured models for Semantic Segmentation

Exploring Context with Deep Structured models for Semantic Segmentation APPEARING IN IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, APRIL 2017. 1 Exploring Context with Deep Structure moels for Semantic Segmentation Guosheng Lin, Chunhua Shen, Anton van en

More information

Learning Deep Structured Models for Semantic Segmentation. Guosheng Lin

Learning Deep Structured Models for Semantic Segmentation. Guosheng Lin Learning Deep Structured Models for Semantic Segmentation Guosheng Lin Semantic Segmentation Outline Exploring Context with Deep Structured Models Guosheng Lin, Chunhua Shen, Ian Reid, Anton van dan Hengel;

More information

Efficient Piecewise Training of Deep Structured Models for Semantic Segmentation

Efficient Piecewise Training of Deep Structured Models for Semantic Segmentation 1 Efficient Piecewise Training of Deep Structured Models for Semantic Segmentation Guosheng Lin, Chunhua Shen, Ian Reid, Anton van den Hengel The University of Adelaide, Australia; and Australian Centre

More information

Dense Disparity Estimation in Ego-motion Reduced Search Space

Dense Disparity Estimation in Ego-motion Reduced Search Space Dense Disparity Estimation in Ego-motion Reuce Search Space Luka Fućek, Ivan Marković, Igor Cvišić, Ivan Petrović University of Zagreb, Faculty of Electrical Engineering an Computing, Croatia (e-mail:

More information

Classifying Facial Expression with Radial Basis Function Networks, using Gradient Descent and K-means

Classifying Facial Expression with Radial Basis Function Networks, using Gradient Descent and K-means Classifying Facial Expression with Raial Basis Function Networks, using Graient Descent an K-means Neil Allrin Department of Computer Science University of California, San Diego La Jolla, CA 9237 nallrin@cs.ucs.eu

More information

Exploiting Depth from Single Monocular Images for Object Detection and Semantic Segmentation

Exploiting Depth from Single Monocular Images for Object Detection and Semantic Segmentation APPEARING IN IEEE TRANSACTIONS ON IMAGE PROCESSING, OCTOBER 2016 1 Exploiting Depth from Single Monocular Images for Object Detection and Semantic Segmentation Yuanzhouhan Cao, Chunhua Shen, Heng Tao Shen

More information

Skyline Community Search in Multi-valued Networks

Skyline Community Search in Multi-valued Networks Syline Community Search in Multi-value Networs Rong-Hua Li Beijing Institute of Technology Beijing, China lironghuascut@gmail.com Jeffrey Xu Yu Chinese University of Hong Kong Hong Kong, China yu@se.cuh.eu.h

More information

DeepLab: Semantic Image Segmentation with Deep Convolutional Nets, Atrous Convolution and Fully Connected CRFs

DeepLab: Semantic Image Segmentation with Deep Convolutional Nets, Atrous Convolution and Fully Connected CRFs DeepLab: Semantic Image Segmentation with Deep Convolutional Nets, Atrous Convolution and Fully Connected CRFs Zhipeng Yan, Moyuan Huang, Hao Jiang 5/1/2017 1 Outline Background semantic segmentation Objective,

More information

1 Surprises in high dimensions

1 Surprises in high dimensions 1 Surprises in high imensions Our intuition about space is base on two an three imensions an can often be misleaing in high imensions. It is instructive to analyze the shape an properties of some basic

More information

Shift-map Image Registration

Shift-map Image Registration Shift-map Image Registration Svärm, Linus; Stranmark, Petter Unpublishe: 2010-01-01 Link to publication Citation for publishe version (APA): Svärm, L., & Stranmark, P. (2010). Shift-map Image Registration.

More information

REGION AVERAGE POOLING FOR CONTEXT-AWARE OBJECT DETECTION

REGION AVERAGE POOLING FOR CONTEXT-AWARE OBJECT DETECTION REGION AVERAGE POOLING FOR CONTEXT-AWARE OBJECT DETECTION Kingsley Kuan 1, Gaurav Manek 1, Jie Lin 1, Yuan Fang 1, Vijay Chandrasekhar 1,2 Institute for Infocomm Research, A*STAR, Singapore 1 Nanyang Technological

More information

6 Gradient Descent. 6.1 Functions

6 Gradient Descent. 6.1 Functions 6 Graient Descent In this topic we will iscuss optimizing over general functions f. Typically the function is efine f : R! R; that is its omain is multi-imensional (in this case -imensional) an output

More information

Particle Swarm Optimization Based on Smoothing Approach for Solving a Class of Bi-Level Multiobjective Programming Problem

Particle Swarm Optimization Based on Smoothing Approach for Solving a Class of Bi-Level Multiobjective Programming Problem BULGARIAN ACADEMY OF SCIENCES CYBERNETICS AND INFORMATION TECHNOLOGIES Volume 17, No 3 Sofia 017 Print ISSN: 1311-970; Online ISSN: 1314-4081 DOI: 10.1515/cait-017-0030 Particle Swarm Optimization Base

More information

Using Vector and Raster-Based Techniques in Categorical Map Generalization

Using Vector and Raster-Based Techniques in Categorical Map Generalization Thir ICA Workshop on Progress in Automate Map Generalization, Ottawa, 12-14 August 1999 1 Using Vector an Raster-Base Techniques in Categorical Map Generalization Beat Peter an Robert Weibel Department

More information

Almost Disjunct Codes in Large Scale Multihop Wireless Network Media Access Control

Almost Disjunct Codes in Large Scale Multihop Wireless Network Media Access Control Almost Disjunct Coes in Large Scale Multihop Wireless Network Meia Access Control D. Charles Engelhart Anan Sivasubramaniam Penn. State University University Park PA 682 engelhar,anan @cse.psu.eu Abstract

More information

Shift-map Image Registration

Shift-map Image Registration Shift-map Image Registration Linus Svärm Petter Stranmark Centre for Mathematical Sciences, Lun University {linus,petter}@maths.lth.se Abstract Shift-map image processing is a new framework base on energy

More information

Transient analysis of wave propagation in 3D soil by using the scaled boundary finite element method

Transient analysis of wave propagation in 3D soil by using the scaled boundary finite element method Southern Cross University epublications@scu 23r Australasian Conference on the Mechanics of Structures an Materials 214 Transient analysis of wave propagation in 3D soil by using the scale bounary finite

More information

A Duality Based Approach for Realtime TV-L 1 Optical Flow

A Duality Based Approach for Realtime TV-L 1 Optical Flow A Duality Base Approach for Realtime TV-L 1 Optical Flow C. Zach 1, T. Pock 2, an H. Bischof 2 1 VRVis Research Center 2 Institute for Computer Graphics an Vision, TU Graz Abstract. Variational methos

More information

Multilevel Linear Dimensionality Reduction using Hypergraphs for Data Analysis

Multilevel Linear Dimensionality Reduction using Hypergraphs for Data Analysis Multilevel Linear Dimensionality Reuction using Hypergraphs for Data Analysis Haw-ren Fang Department of Computer Science an Engineering University of Minnesota; Minneapolis, MN 55455 hrfang@csumneu ABSTRACT

More information

Deep Spatial Pyramid for Person Re-identification

Deep Spatial Pyramid for Person Re-identification Deep Spatial Pyrami for Person Re-ientification Sławomir Bąk Peter Carr Disney Research Pittsburgh, PA, USA, 15213 {slawomir.bak,peter.carr}@isneyresearch.com Abstract Re-ientification refers to the task

More information

A Plane Tracker for AEC-automation Applications

A Plane Tracker for AEC-automation Applications A Plane Tracker for AEC-automation Applications Chen Feng *, an Vineet R. Kamat Department of Civil an Environmental Engineering, University of Michigan, Ann Arbor, USA * Corresponing author (cforrest@umich.eu)

More information

Using the disparity space to compute occupancy grids from stereo-vision

Using the disparity space to compute occupancy grids from stereo-vision The 2010 IEEE/RSJ International Conference on Intelligent Robots an Systems October 18-22, 2010, Taipei, Taiwan Using the isparity space to compute occupancy gris from stereo-vision Mathias Perrollaz,

More information

Non-homogeneous Generalization in Privacy Preserving Data Publishing

Non-homogeneous Generalization in Privacy Preserving Data Publishing Non-homogeneous Generalization in Privacy Preserving Data Publishing W. K. Wong, Nios Mamoulis an Davi W. Cheung Department of Computer Science, The University of Hong Kong Pofulam Roa, Hong Kong {wwong2,nios,cheung}@cs.hu.h

More information

Characterizing Decoding Robustness under Parametric Channel Uncertainty

Characterizing Decoding Robustness under Parametric Channel Uncertainty Characterizing Decoing Robustness uner Parametric Channel Uncertainty Jay D. Wierer, Wahee U. Bajwa, Nigel Boston, an Robert D. Nowak Abstract This paper characterizes the robustness of ecoing uner parametric

More information

Deep learning for dense per-pixel prediction. Chunhua Shen The University of Adelaide, Australia

Deep learning for dense per-pixel prediction. Chunhua Shen The University of Adelaide, Australia Deep learning for dense per-pixel prediction Chunhua Shen The University of Adelaide, Australia Image understanding Classification error Convolution Neural Networks 0.3 0.2 0.1 Image Classification [Krizhevsky

More information

Coupling the User Interfaces of a Multiuser Program

Coupling the User Interfaces of a Multiuser Program Coupling the User Interfaces of a Multiuser Program PRASUN DEWAN University of North Carolina at Chapel Hill RAJIV CHOUDHARY Intel Corporation We have evelope a new moel for coupling the user-interfaces

More information

Fast Window Based Stereo Matching for 3D Scene Reconstruction

Fast Window Based Stereo Matching for 3D Scene Reconstruction The International Arab Journal of Information Technology, Vol. 0, No. 3, May 203 209 Fast Winow Base Stereo Matching for 3D Scene Reconstruction Mohamma Mozammel Chowhury an Mohamma AL-Amin Bhuiyan Department

More information

Discrete Markov Image Modeling and Inference on the Quadtree

Discrete Markov Image Modeling and Inference on the Quadtree 390 IEEE TRANSACTIONS ON IMAGE PROCESSING, VOL. 9, NO. 3, MARCH 2000 Discrete Markov Image Moeling an Inference on the Quatree Jean-Marc Laferté, Patrick Pérez, an Fabrice Heitz Abstract Noncasual Markov

More information

Random Clustering for Multiple Sampling Units to Speed Up Run-time Sample Generation

Random Clustering for Multiple Sampling Units to Speed Up Run-time Sample Generation DEIM Forum 2018 I4-4 Abstract Ranom Clustering for Multiple Sampling Units to Spee Up Run-time Sample Generation uzuru OKAJIMA an Koichi MARUAMA NEC Solution Innovators, Lt. 1-18-7 Shinkiba, Koto-ku, Tokyo,

More information

Instance-aware Semantic Segmentation via Multi-task Network Cascades

Instance-aware Semantic Segmentation via Multi-task Network Cascades Instance-aware Semantic Segmentation via Multi-task Network Cascades Jifeng Dai, Kaiming He, Jian Sun Microsoft research 2016 Yotam Gil Amit Nativ Agenda Introduction Highlights Implementation Further

More information

A Multi-scale Boosted Detector for Efficient and Robust Gesture Recognition

A Multi-scale Boosted Detector for Efficient and Robust Gesture Recognition A Multi-scale Booste Detector for Efficient an Robust Gesture Recognition Camille Monnier, Stan German, Anrey Ost Charles River Analytics Cambrige, MA, USA Abstract. We present an approach to etecting

More information

A Discrete Search Method for Multi-modal Non-Rigid Image Registration

A Discrete Search Method for Multi-modal Non-Rigid Image Registration A Discrete Search Metho for Multi-moal on-rigi Image Registration Alexaner Shekhovtsov Juan D. García-Arteaga Tomáš Werner Center for Machine Perception, Dept. of Cybernetics Faculty of Electrical Engineering,

More information

EXTRACTION OF MAIN AND SECONDARY ROADS IN VHR IMAGES USING A HIGHER-ORDER PHASE FIELD MODEL

EXTRACTION OF MAIN AND SECONDARY ROADS IN VHR IMAGES USING A HIGHER-ORDER PHASE FIELD MODEL EXTRACTION OF MAIN AND SECONDARY ROADS IN VHR IMAGES USING A HIGHER-ORDER PHASE FIELD MODEL Ting Peng a,b, Ian H. Jermyn b, Véronique Prinet a, Josiane Zerubia b a LIAMA & PR, Institute of Automation,

More information

Design of Policy-Aware Differentially Private Algorithms

Design of Policy-Aware Differentially Private Algorithms Design of Policy-Aware Differentially Private Algorithms Samuel Haney Due University Durham, NC, USA shaney@cs.ue.eu Ashwin Machanavajjhala Due University Durham, NC, USA ashwin@cs.ue.eu Bolin Ding Microsoft

More information

Divide-and-Conquer Algorithms

Divide-and-Conquer Algorithms Supplment to A Practical Guie to Data Structures an Algorithms Using Java Divie-an-Conquer Algorithms Sally A Golman an Kenneth J Golman Hanout Divie-an-conquer algorithms use the following three phases:

More information

Learning Subproblem Complexities in Distributed Branch and Bound

Learning Subproblem Complexities in Distributed Branch and Bound Learning Subproblem Complexities in Distribute Branch an Boun Lars Otten Department of Computer Science University of California, Irvine lotten@ics.uci.eu Rina Dechter Department of Computer Science University

More information

State Indexed Policy Search by Dynamic Programming. Abstract. 1. Introduction. 2. System parameterization. Charles DuHadway

State Indexed Policy Search by Dynamic Programming. Abstract. 1. Introduction. 2. System parameterization. Charles DuHadway State Inexe Policy Search by Dynamic Programming Charles DuHaway Yi Gu 5435537 503372 December 4, 2007 Abstract We consier the reinforcement learning problem of simultaneous trajectory-following an obstacle

More information

Conditional Random Fields as Recurrent Neural Networks

Conditional Random Fields as Recurrent Neural Networks BIL722 - Deep Learning for Computer Vision Conditional Random Fields as Recurrent Neural Networks S. Zheng, S. Jayasumana, B. Romera-Paredes V. Vineet, Z. Su, D. Du, C. Huang, P.H.S. Torr Introduction

More information

Multimodal Stereo Image Registration for Pedestrian Detection

Multimodal Stereo Image Registration for Pedestrian Detection Multimoal Stereo Image Registration for Peestrian Detection Stephen Krotosky an Mohan Trivei Abstract This paper presents an approach for the registration of multimoal imagery for peestrian etection when

More information

Generalized Edge Coloring for Channel Assignment in Wireless Networks

Generalized Edge Coloring for Channel Assignment in Wireless Networks Generalize Ege Coloring for Channel Assignment in Wireless Networks Chun-Chen Hsu Institute of Information Science Acaemia Sinica Taipei, Taiwan Da-wei Wang Jan-Jan Wu Institute of Information Science

More information

Online Appendix to: Generalizing Database Forensics

Online Appendix to: Generalizing Database Forensics Online Appenix to: Generalizing Database Forensics KYRIACOS E. PAVLOU an RICHARD T. SNODGRASS, University of Arizona This appenix presents a step-by-step iscussion of the forensic analysis protocol that

More information

arxiv: v2 [cs.lg] 22 Jan 2019

arxiv: v2 [cs.lg] 22 Jan 2019 Spatial Variational Auto-Encoing via Matrix-Variate Normal Distributions Zhengyang Wang Hao Yuan Shuiwang Ji arxiv:1705.06821v2 [cs.lg] 22 Jan 2019 Abstract The key iea of variational auto-encoers (VAEs)

More information

Message Transport With The User Datagram Protocol

Message Transport With The User Datagram Protocol Message Transport With The User Datagram Protocol User Datagram Protocol (UDP) Use During startup For VoIP an some vieo applications Accounts for less than 10% of Internet traffic Blocke by some ISPs Computer

More information

Extend the shallow part of Single Shot MultiBox Detector via Convolutional Neural Network

Extend the shallow part of Single Shot MultiBox Detector via Convolutional Neural Network Extend the shallow part of Single Shot MultiBox Detector via Convolutional Neural Network Liwen Zheng, Canmiao Fu, Yong Zhao * School of Electronic and Computer Engineering, Shenzhen Graduate School of

More information

A Comparative Evaluation of Iris and Ocular Recognition Methods on Challenging Ocular Images

A Comparative Evaluation of Iris and Ocular Recognition Methods on Challenging Ocular Images A Comparative Evaluation of Iris an Ocular Recognition Methos on Challenging Ocular Images Vishnu Naresh Boeti Carnegie Mellon University Pittsburgh, PA 523 naresh@cmu.eu Jonathon M Smereka Carnegie Mellon

More information

The Reconstruction of Graphs. Dhananjay P. Mehendale Sir Parashurambhau College, Tilak Road, Pune , India. Abstract

The Reconstruction of Graphs. Dhananjay P. Mehendale Sir Parashurambhau College, Tilak Road, Pune , India. Abstract The Reconstruction of Graphs Dhananay P. Mehenale Sir Parashurambhau College, Tila Roa, Pune-4030, Inia. Abstract In this paper we iscuss reconstruction problems for graphs. We evelop some new ieas lie

More information

Estimating Velocity Fields on a Freeway from Low Resolution Video

Estimating Velocity Fields on a Freeway from Low Resolution Video Estimating Velocity Fiels on a Freeway from Low Resolution Vieo Young Cho Department of Statistics University of California, Berkeley Berkeley, CA 94720-3860 Email: young@stat.berkeley.eu John Rice Department

More information

MORA: a Movement-Based Routing Algorithm for Vehicle Ad Hoc Networks

MORA: a Movement-Based Routing Algorithm for Vehicle Ad Hoc Networks : a Movement-Base Routing Algorithm for Vehicle A Hoc Networks Fabrizio Granelli, Senior Member, Giulia Boato, Member, an Dzmitry Kliazovich, Stuent Member Abstract Recent interest in car-to-car communications

More information

A shortest path algorithm in multimodal networks: a case study with time varying costs

A shortest path algorithm in multimodal networks: a case study with time varying costs A shortest path algorithm in multimoal networks: a case stuy with time varying costs Daniela Ambrosino*, Anna Sciomachen* * Department of Economics an Quantitative Methos (DIEM), University of Genoa Via

More information

An Algorithm for Building an Enterprise Network Topology Using Widespread Data Sources

An Algorithm for Building an Enterprise Network Topology Using Widespread Data Sources An Algorithm for Builing an Enterprise Network Topology Using Wiesprea Data Sources Anton Anreev, Iurii Bogoiavlenskii Petrozavosk State University Petrozavosk, Russia {anreev, ybgv}@cs.petrsu.ru Abstract

More information

Computer Organization

Computer Organization Computer Organization Douglas Comer Computer Science Department Purue University 250 N. University Street West Lafayette, IN 47907-2066 http://www.cs.purue.eu/people/comer Copyright 2006. All rights reserve.

More information

Open Access Adaptive Image Enhancement Algorithm with Complex Background

Open Access Adaptive Image Enhancement Algorithm with Complex Background Sen Orers for Reprints to reprints@benthamscience.ae 594 The Open Cybernetics & Systemics Journal, 205, 9, 594-600 Open Access Aaptive Image Enhancement Algorithm with Complex Bacgroun Zhang Pai * epartment

More information

Bayesian localization microscopy reveals nanoscale podosome dynamics

Bayesian localization microscopy reveals nanoscale podosome dynamics Nature Methos Bayesian localization microscopy reveals nanoscale poosome ynamics Susan Cox, Ewar Rosten, James Monypenny, Tijana Jovanovic-Talisman, Dylan T Burnette, Jennifer Lippincott-Schwartz, Gareth

More information

EFFICIENT STEREO MATCHING BASED ON A NEW CONFIDENCE METRIC. Won-Hee Lee, Yumi Kim, and Jong Beom Ra

EFFICIENT STEREO MATCHING BASED ON A NEW CONFIDENCE METRIC. Won-Hee Lee, Yumi Kim, and Jong Beom Ra th European Signal Processing Conference (EUSIPCO ) Bucharest, omania, August 7-3, EFFICIENT STEEO MATCHING BASED ON A NEW CONFIDENCE METIC Won-Hee Lee, Yumi Kim, an Jong Beom a Department of Electrical

More information

A fast embedded selection approach for color texture classification using degraded LBP

A fast embedded selection approach for color texture classification using degraded LBP A fast embee selection approach for color texture classification using egrae A. Porebski, N. Vanenbroucke an D. Hama Laboratoire LISIC - EA 4491 - Université u Littoral Côte Opale - 50, rue Ferinan Buisson

More information

Discriminative Filters for Depth from Defocus

Discriminative Filters for Depth from Defocus Discriminative Filters for Depth from Defocus Fahim Mannan an Michael S. Langer School of Computer Science, McGill University Montreal, Quebec HA 0E9, Canaa. {fmannan, langer}@cim.mcgill.ca Abstract Depth

More information

Offloading Cellular Traffic through Opportunistic Communications: Analysis and Optimization

Offloading Cellular Traffic through Opportunistic Communications: Analysis and Optimization 1 Offloaing Cellular Traffic through Opportunistic Communications: Analysis an Optimization Vincenzo Sciancalepore, Domenico Giustiniano, Albert Banchs, Anreea Picu arxiv:1405.3548v1 [cs.ni] 14 May 24

More information

Semantic Segmentation. Zhongang Qi

Semantic Segmentation. Zhongang Qi Semantic Segmentation Zhongang Qi qiz@oregonstate.edu Semantic Segmentation "Two men riding on a bike in front of a building on the road. And there is a car." Idea: recognizing, understanding what's in

More information

Using Ray Tracing for Site-Specific Indoor Radio Signal Strength Analysis 1

Using Ray Tracing for Site-Specific Indoor Radio Signal Strength Analysis 1 Using Ray Tracing for Site-Specific Inoor Raio Signal Strength Analysis 1 Michael Ni, Stephen Mann, an Jay Black Computer Science Department, University of Waterloo, Waterloo, Ontario, NL G1, Canaa Abstract

More information

Generalized Edge Coloring for Channel Assignment in Wireless Networks

Generalized Edge Coloring for Channel Assignment in Wireless Networks TR-IIS-05-021 Generalize Ege Coloring for Channel Assignment in Wireless Networks Chun-Chen Hsu, Pangfeng Liu, Da-Wei Wang, Jan-Jan Wu December 2005 Technical Report No. TR-IIS-05-021 http://www.iis.sinica.eu.tw/lib/techreport/tr2005/tr05.html

More information

A Convex Clustering-based Regularizer for Image Segmentation

A Convex Clustering-based Regularizer for Image Segmentation Vision, Moeling, an Visualization (2015) D. Bommes, T. Ritschel an T. Schultz (Es.) A Convex Clustering-base Regularizer for Image Segmentation Benjamin Hell (TU Braunschweig), Marcus Magnor (TU Braunschweig)

More information

Supplementary Material: Pixelwise Instance Segmentation with a Dynamically Instantiated Network

Supplementary Material: Pixelwise Instance Segmentation with a Dynamically Instantiated Network Supplementary Material: Pixelwise Instance Segmentation with a Dynamically Instantiated Network Anurag Arnab and Philip H.S. Torr University of Oxford {anurag.arnab, philip.torr}@eng.ox.ac.uk 1. Introduction

More information

Predicting Depth, Surface Normals and Semantic Labels with a Common Multi-Scale Convolutional Architecture David Eigen, Rob Fergus

Predicting Depth, Surface Normals and Semantic Labels with a Common Multi-Scale Convolutional Architecture David Eigen, Rob Fergus Predicting Depth, Surface Normals and Semantic Labels with a Common Multi-Scale Convolutional Architecture David Eigen, Rob Fergus Presented by: Rex Ying and Charles Qi Input: A Single RGB Image Estimate

More information

Optimal Distributed P2P Streaming under Node Degree Bounds

Optimal Distributed P2P Streaming under Node Degree Bounds Optimal Distribute P2P Streaming uner Noe Degree Bouns Shaoquan Zhang, Ziyu Shao, Minghua Chen, an Libin Jiang Department of Information Engineering, The Chinese University of Hong Kong Department of EECS,

More information

SURVIVABLE IP OVER WDM: GUARANTEEEING MINIMUM NETWORK BANDWIDTH

SURVIVABLE IP OVER WDM: GUARANTEEEING MINIMUM NETWORK BANDWIDTH SURVIVABLE IP OVER WDM: GUARANTEEEING MINIMUM NETWORK BANDWIDTH Galen H Sasaki Dept Elec Engg, U Hawaii 2540 Dole Street Honolul HI 96822 USA Ching-Fong Su Fuitsu Laboratories of America 595 Lawrence Expressway

More information

Learning Polynomial Functions. by Feature Construction

Learning Polynomial Functions. by Feature Construction I Proceeings of the Eighth International Workshop on Machine Learning Chicago, Illinois, June 27-29 1991 Learning Polynomial Functions by Feature Construction Richar S. Sutton GTE Laboratories Incorporate

More information

MANJUSHA K.*, ANAND KUMAR M., SOMAN K. P.

MANJUSHA K.*, ANAND KUMAR M., SOMAN K. P. Journal of Engineering Science an echnology Vol. 13, No. 1 (2018) 141-157 School of Engineering, aylor s University IMPLEMENAION OF REJECION SRAEGIES INSIDE MALAYALAM CHARACER RECOGNIION SYSEM BASED ON

More information

Real Time On Board Stereo Camera Pose through Image Registration*

Real Time On Board Stereo Camera Pose through Image Registration* 28 IEEE Intelligent Vehicles Symposium Einhoven University of Technology Einhoven, The Netherlans, June 4-6, 28 Real Time On Boar Stereo Camera Pose through Image Registration* Fai Dornaika French National

More information

Detecting Overlapping Communities from Local Spectral Subspaces

Detecting Overlapping Communities from Local Spectral Subspaces Detecting Overlapping Communities from Local Spectral Subspaces Kun He, Yiwei Sun Huazhong University of Science an Technology Wuhan 430074, China Email: {brooklet60, yiweisun}@hust.eu.cn Davi Binel, John

More information

Stereo Vision-based Subpixel Level Free Space Boundary Detection Using Modified u-disparity and Preview Dynamic Programming

Stereo Vision-based Subpixel Level Free Space Boundary Detection Using Modified u-disparity and Preview Dynamic Programming 2015 IEEE Intelligent Vehicles Symposium (IV) June 28 - July 1, 2015. COEX, Seoul, Korea Stereo Vision-base Subpixel Level Free Space Bounary Detection Using Moifie u-isparity an Preview Dynamic Programming

More information

Fully Convolutional Networks for Semantic Segmentation

Fully Convolutional Networks for Semantic Segmentation Fully Convolutional Networks for Semantic Segmentation Jonathan Long* Evan Shelhamer* Trevor Darrell UC Berkeley Chaim Ginzburg for Deep Learning seminar 1 Semantic Segmentation Define a pixel-wise labeling

More information

A Neural Network Model Based on Graph Matching and Annealing :Application to Hand-Written Digits Recognition

A Neural Network Model Based on Graph Matching and Annealing :Application to Hand-Written Digits Recognition ITERATIOAL JOURAL OF MATHEMATICS AD COMPUTERS I SIMULATIO A eural etwork Moel Base on Graph Matching an Annealing :Application to Han-Written Digits Recognition Kyunghee Lee Abstract We present a neural

More information

On the Placement of Internet Taps in Wireless Neighborhood Networks

On the Placement of Internet Taps in Wireless Neighborhood Networks 1 On the Placement of Internet Taps in Wireless Neighborhoo Networks Lili Qiu, Ranveer Chanra, Kamal Jain, Mohamma Mahian Abstract Recently there has emerge a novel application of wireless technology that

More information

Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks

Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks Shaoqing Ren, Kaiming He, Ross Girshick, and Jian Sun Presented by Tushar Bansal Objective 1. Get bounding box for all objects

More information

Queueing Model and Optimization of Packet Dropping in Real-Time Wireless Sensor Networks

Queueing Model and Optimization of Packet Dropping in Real-Time Wireless Sensor Networks Queueing Moel an Optimization of Packet Dropping in Real-Time Wireless Sensor Networks Marc Aoun, Antonios Argyriou, Philips Research, Einhoven, 66AE, The Netherlans Department of Computer an Communication

More information

Sampling the Disparity Space Image

Sampling the Disparity Space Image Accepte for publication, June 2003. To appear in IEEE Transactions on Pattern Analysis an Machine Intelligence (PAMI) Sampling the Disparity Space Image Richar Szeliski Daniel Scharstein Microsoft Research

More information

Software Reliability Modeling and Cost Estimation Incorporating Testing-Effort and Efficiency

Software Reliability Modeling and Cost Estimation Incorporating Testing-Effort and Efficiency Software Reliability Moeling an Cost Estimation Incorporating esting-effort an Efficiency Chin-Yu Huang, Jung-Hua Lo, Sy-Yen Kuo, an Michael R. Lyu -+ Department of Electrical Engineering Computer Science

More information

Fast Fractal Image Compression using PSO Based Optimization Techniques

Fast Fractal Image Compression using PSO Based Optimization Techniques Fast Fractal Compression using PSO Base Optimization Techniques A.Krishnamoorthy Visiting faculty Department Of ECE University College of Engineering panruti rishpci89@gmail.com S.Buvaneswari Visiting

More information

CS 106 Winter 2016 Craig S. Kaplan. Module 01 Processing Recap. Topics

CS 106 Winter 2016 Craig S. Kaplan. Module 01 Processing Recap. Topics CS 106 Winter 2016 Craig S. Kaplan Moule 01 Processing Recap Topics The basic parts of speech in a Processing program Scope Review of syntax for classes an objects Reaings Your CS 105 notes Learning Processing,

More information

On the Role of Multiply Sectioned Bayesian Networks to Cooperative Multiagent Systems

On the Role of Multiply Sectioned Bayesian Networks to Cooperative Multiagent Systems On the Role of Multiply Sectione Bayesian Networks to Cooperative Multiagent Systems Y. Xiang University of Guelph, Canaa, yxiang@cis.uoguelph.ca V. Lesser University of Massachusetts at Amherst, USA,

More information

Supporting Fully Adaptive Routing in InfiniBand Networks

Supporting Fully Adaptive Routing in InfiniBand Networks XIV JORNADAS DE PARALELISMO - LEGANES, SEPTIEMBRE 200 1 Supporting Fully Aaptive Routing in InfiniBan Networks J.C. Martínez, J. Flich, A. Robles, P. López an J. Duato Resumen InfiniBan is a new stanar

More information

Image Segmentation using K-means clustering and Thresholding

Image Segmentation using K-means clustering and Thresholding Image Segmentation using Kmeans clustering an Thresholing Preeti Panwar 1, Girhar Gopal 2, Rakesh Kumar 3 1M.Tech Stuent, Department of Computer Science & Applications, Kurukshetra University, Kurukshetra,

More information

Loop Scheduling and Partitions for Hiding Memory Latencies

Loop Scheduling and Partitions for Hiding Memory Latencies Loop Scheuling an Partitions for Hiing Memory Latencies Fei Chen Ewin Hsing-Mean Sha Dept. of Computer Science an Engineering University of Notre Dame Notre Dame, IN 46556 Email: fchen,esha @cse.n.eu Tel:

More information

Intensive Hypercube Communication: Prearranged Communication in Link-Bound Machines 1 2

Intensive Hypercube Communication: Prearranged Communication in Link-Bound Machines 1 2 This paper appears in J. of Parallel an Distribute Computing 10 (1990), pp. 167 181. Intensive Hypercube Communication: Prearrange Communication in Link-Boun Machines 1 2 Quentin F. Stout an Bruce Wagar

More information

Solution Representation for Job Shop Scheduling Problems in Ant Colony Optimisation

Solution Representation for Job Shop Scheduling Problems in Ant Colony Optimisation Solution Representation for Job Shop Scheuling Problems in Ant Colony Optimisation James Montgomery, Carole Faya 2, an Sana Petrovic 2 Faculty of Information & Communication Technologies, Swinburne University

More information

Semantic Segmentation

Semantic Segmentation Semantic Segmentation UCLA:https://goo.gl/images/I0VTi2 OUTLINE Semantic Segmentation Why? Paper to talk about: Fully Convolutional Networks for Semantic Segmentation. J. Long, E. Shelhamer, and T. Darrell,

More information

Adjacency Matrix Based Full-Text Indexing Models

Adjacency Matrix Based Full-Text Indexing Models 1000-9825/2002/13(10)1933-10 2002 Journal of Software Vol.13, No.10 Ajacency Matrix Base Full-Text Inexing Moels ZHOU Shui-geng 1, HU Yun-fa 2, GUAN Ji-hong 3 1 (Department of Computer Science an Engineering,

More information

Performance Modelling of Necklace Hypercubes

Performance Modelling of Necklace Hypercubes erformance Moelling of ecklace ypercubes. Meraji,,. arbazi-aza,, A. atooghy, IM chool of Computer cience & harif University of Technology, Tehran, Iran {meraji, patooghy}@ce.sharif.eu, aza@ipm.ir a Abstract

More information

Evolutionary Optimisation Methods for Template Based Image Registration

Evolutionary Optimisation Methods for Template Based Image Registration Evolutionary Optimisation Methos for Template Base Image Registration Lukasz A Machowski, Tshilizi Marwala School of Electrical an Information Engineering University of Witwatersran, Johannesburg, South

More information

Modifying ROC Curves to Incorporate Predicted Probabilities

Modifying ROC Curves to Incorporate Predicted Probabilities Moifying ROC Curves to Incorporate Preicte Probabilities Cèsar Ferri DSIC, Universitat Politècnica e València Peter Flach Department of Computer Science, University of Bristol José Hernánez-Orallo DSIC,

More information

Estimation of large-amplitude motion and disparity fields: Application to intermediate view reconstruction

Estimation of large-amplitude motion and disparity fields: Application to intermediate view reconstruction c 2000 SPIE. Personal use of this material is permitte. However, permission to reprint/republish this material for avertising or promotional purposes or for creating new collective works for resale or

More information

Improving Web Search Ranking by Incorporating User Behavior Information

Improving Web Search Ranking by Incorporating User Behavior Information Improving Web Search Ranking by Incorporating User Behavior Information Eugene Agichtein Microsoft Research eugeneag@microsoft.com Eric Brill Microsoft Research brill@microsoft.com Susan Dumais Microsoft

More information

Short-term prediction of photovoltaic power based on GWPA - BP neural network model

Short-term prediction of photovoltaic power based on GWPA - BP neural network model Short-term preiction of photovoltaic power base on GWPA - BP neural networ moel Jian Di an Shanshan Meng School of orth China Electric Power University, Baoing. China Abstract In recent years, ue to China's

More information

RefineNet: Multi-Path Refinement Networks for High-Resolution Semantic Segmentation

RefineNet: Multi-Path Refinement Networks for High-Resolution Semantic Segmentation : Multi-Path Refinement Networks for High-Resolution Semantic Segmentation Guosheng Lin 1 Anton Milan 2 Chunhua Shen 2,3 Ian Reid 2,3 1 Nanyang Technological University 2 University of Adelaide 3 Australian

More information

William S. Law. Erik K. Antonsson. Engineering Design Research Laboratory. California Institute of Technology. Abstract

William S. Law. Erik K. Antonsson. Engineering Design Research Laboratory. California Institute of Technology. Abstract Optimization Methos for Calculating Design Imprecision y William S. Law Eri K. Antonsson Engineering Design Research Laboratory Division of Engineering an Applie Science California Institute of Technology

More information

Tight Wavelet Frame Decomposition and Its Application in Image Processing

Tight Wavelet Frame Decomposition and Its Application in Image Processing ITB J. Sci. Vol. 40 A, No., 008, 151-165 151 Tight Wavelet Frame Decomposition an Its Application in Image Processing Mahmu Yunus 1, & Henra Gunawan 1 1 Analysis an Geometry Group, FMIPA ITB, Banung Department

More information

EXACT SIMULATION OF A BOOLEAN MODEL

EXACT SIMULATION OF A BOOLEAN MODEL Original Research Paper oi:10.5566/ias.v32.p101-105 EXACT SIMULATION OF A BOOLEAN MODEL CHRISTIAN LANTUÉJOULB MinesParisTech 35 rue Saint-Honoré 77305 Fontainebleau France e-mail: christian.lantuejoul@mines-paristech.fr

More information

Spatial Localization and Detection. Lecture 8-1

Spatial Localization and Detection. Lecture 8-1 Lecture 8: Spatial Localization and Detection Lecture 8-1 Administrative - Project Proposals were due on Saturday Homework 2 due Friday 2/5 Homework 1 grades out this week Midterm will be in-class on Wednesday

More information

Chapter 5 Proposed models for reconstituting/ adapting three stereoscopes

Chapter 5 Proposed models for reconstituting/ adapting three stereoscopes Chapter 5 Propose moels for reconstituting/ aapting three stereoscopes - 89 - 5. Propose moels for reconstituting/aapting three stereoscopes This chapter offers three contributions in the Stereoscopy area,

More information

Comparison of Methods for Increasing the Performance of a DUA Computation

Comparison of Methods for Increasing the Performance of a DUA Computation Comparison of Methos for Increasing the Performance of a DUA Computation Michael Behrisch, Daniel Krajzewicz, Peter Wagner an Yun-Pang Wang Institute of Transportation Systems, German Aerospace Center,

More information