arxiv: v1 [cs.cv] 23 Mar 2018

Size: px
Start display at page:

Download "arxiv: v1 [cs.cv] 23 Mar 2018"

Transcription

1 LayoutNet: Reconstructing the 3D Room Layout from a Single RGB Image Chuhang Zou Alex Colburn Qi Shan Derek Hoiem University of Illinois at Urbana-Champaign Zillow Group {czou4, dhoiem}@illinois.edu {alexco, qis}@zillow.com arxiv: v1 [cs.cv] 23 Mar 2018 Abstract We propose an algorithm to predict room layout from a single image that generalizes across panoramas and perspective images, cuboid layouts and more general layouts (e.g. L -shape room). Our method operates directly on the panoramic image, rather than decomposing into perspective images as do recent works. Our network architecture is similar to that of RoomNet [16], but we show improvements due to aligning the image based on vanishing points, predicting multiple layout elements (corners, boundaries, size and translation), and fitting a constrained Manhattan layout to the resulting predictions. Our method compares well in speed and accuracy to other existing work on panoramas, achieves among the best accuracy for perspective images, and can handle both cuboid-shaped and more general Manhattan layouts. 1. Introduction Estimating the 3D layout of a room from one image is an important goal, with applications such as robotics and virtual/augmented reality. The room layout specifies the positions, orientations, and heights of the walls, relative to the camera center. The layout can be represented as a set of projected corner positions or boundaries, or as a 3D mesh. Existing works apply to special cases of the problem, such as predicting cuboid-shaped layouts from perspective images or from panoramic images. We present LayoutNet, a deep convolution neural network (CNN) that estimates the 3D layout of an indoor scene from a single perspective or panoramic image (Figure. 1). Our method compares well in speed and accuracy on panoramas and is among the best on perspective images. Our method also generalizes to non-cuboid Manhattan layouts, such as L -shaped rooms. Code is available at: LayoutNet. Our LayoutNet approach operates in three steps ( Figure. 2). First, our system analyzes the vanishing points LayoutNet Figure 1. Illustration. Our LayoutNet predicts a non-cuboid room layout from a single panorama under equirectangular projection. and aligns the image to be level with the floor (Sec. 3.1). This alignment ensures that wall-wall boundaries are vertical lines and substantially reduces error according to our experiments. In the second step, corner (layout junctions) and boundary probability maps are predicted directly on the image using a CNN with an encoder-decoder structure and skip connections (Sec. 3.2). Corners and boundaries each provide a complete representation of room layout. We find that jointly predicting them in a single network leads to better estimation. Finally, the 3D layout parameters are optimized to fit the predicted corners and boundaries (Sec. 3.4). The final 3D layout loss from our optimization process is difficult to back-propagate through the network, but direct regression of the 3D parameters during training serves as an effective substitute, encouraging predictions that maximize accuracy of the end result. Our contributions are: We propose a more general RGB image to layout algorithm that is suitable for perspective and panoramic 1

2 Input Panorama... Boundary Map (m E ) 3D Layout Parameter Regressor (for training only) vector Manhattan Lines Corner Map (m C ) Manhattan Layout Optimizer Figure 2. Overview. Our LayoutNet follows the encoder-decoder strategy. The network input is a concatenation of a single RGB panorama and Manhattan line map. The network jointly predicts layout boundaries and corner positions. The 3D layout parameter loss encourages predictions that maximize accuracy. The final prediction is a Manhattan constrained layout reconstruction. Best viewed in color. images with Manhattan layouts. Our system compares well in speed and accuracy for panoramic images and achieves the second best for perspective images, while also being the fastest. We demonstrate gains from using precomputed vanishing point cues, geometric constraints, and postprocess optimization, indicating that deep network approaches still benefit from explicit geometric cues and constraints. We also show that adding an objective to directly regress 3D layout parameters leads to better predictions of the boundaries and corners that are used to solve for the final predicted layout. We extend the annotations for the Stanford 2D-3D dataset [1], providing room layout annotations that can be used in future work. 2. Related Work Single-view room layout estimation has been an active topic of research for the past ten years. Delage et al. [7] fit floor/wall boundaries in a perspective image taken by a level camera to create a 3D model under Manhattan world assumptions [3]. The Manhattan world assumptions are that all walls are at right angles to each other and perpendicular to the floor. A special case is the cuboid model, in which four walls, ceiling, and floor enclose the room. Lee et al. [18] produce Orientation Maps, generate layout hypotheses based on detected line segments, and select a best-fitting layout from among them. Hedau et al. [11] recover cuboid layouts by solving for three vanishing points, sampling layouts consistent with those vanishing points, and selecting the best layout based on edge and Geometric Context [13] consistencies. Subsequent works follow a similar approach, with improvements to layout generation [27, 28, 23], features for scoring layouts [28, 23], and incorporation of object hypotheses [12, 17, 5, 6, 34] or other context. The most recent methods train deep network features to classify pixels into layout surfaces (walls, floor, ceiling) [4, 14], boundaries [22], corners [16], or a combination [25]. Nearly all of these works aim to produce cuboid-shaped layouts from perspective RGB images. A few works also operate on panoramic images. Zhang et al. [33] propose the dataset and method to estimate room layout from 360 panoramic images (more on this later). Yang et al. [31] recover layouts from panoramas based on edge cues, Geometric Context, and other priors. Xu et al. [30] estimate layout based on surface orientation estimates and object hypotheses. Other works recover indoor layout from multiple images (e.g., [2]) or RGBD images (e.g., [29, 32, 10, 19]), where estimates rely heavily on 3D points obtained from sensors or multiview constraints. Rent3D [20] takes advantage of a known floor plan. Our approach simplifies reconstruction by estimating layout directly on a single RGB equirectangular panorama. Our final output is a sparse and compact planar Manhattan layout parameterized by each wall s distance to camera, height, and the layout rotation. Our work is most similar in goal to [33] and in approach to RoomNet [16]. extends the frameworks designed for perspective images to panoramas, estimating vanishing points, generating hypotheses, and scoring hypotheses according to Orientation Maps, Geometric Context, and object hypotheses. To compute these features, first projects the panoramic image into multiple overlapping perspective images, and then combines the feature maps back into a panoramic image. Our approach is more direct: after aligning the panoramic image based on vanishing points, our system uses a deep network to predict boundaries and corners directly on the panoramic image. In this regard, we are similar to Room- Net, which uses a deep network to directly predict layout corners in perspective images, as well as a label that indicates which corners are visible. Our method differs from RoomNet in several ways. Our method applies to panoramic images. Our method also differs in the alignment 2

3 step (RoomNet performs none) and in our multitask prediction of boundaries, corners, and 3D cuboid parameters. Our final inference is constrained to produce a Manhattan 3D layout. RoomNet uses an RNN to refine 2D corner position predictions, but those predictions might not be consistent with any 3D cuboid layout. Our experiments show that all of these differences improve results. More generally, we propose the first method, to our knowledge, that applies to both perspective and panoramic images. We also show that our method extends easily to non-cuboid Manhattan layouts. Thus, our method is arguably the most general and effective approach to date for indoor layout estimation from a single RGB image. 3. Approach We first describe our method for predicting cuboidshaped layouts from panoramas: alignment (Sec. 3.1), corner and boundary prediction with a CNN (Sec. 3.2 and 3.3), and optimization of 3D cuboid parameters (Sec. 3.4). Then, we describe modifications to predict on more general (non-cuboid) Manhattan layouts and perspective images (Sec. 3.5) Panoramic image alignment Given the input as a panorama that covers a 360 horizontal field of view, we first align the image by estimating the floor plane direction under spherical projection, rotate the scene, and reproject it to the 2D equirectangular projection. Similar to Zhang et al. s approach [33], we select long line segments using the Line Segment Detector (LSD) [24] in each overlapped perspective view, then vote for three mutually orthogonal vanishing directions using the Hough Transform. This pre-processing step eases our network training. The detected candidate Manhattan line segments also provide additional input features that improve the performance, as shown in Sec Network structure An overview of the LayoutNet network is illustrated in Fig. 2. The network follows an encoder-decoder strategy. Deep panorama encoder: The input is a 6-channel feature map: the concatenation of single RGB panorama with resolution of (or for perspective images) and the Manhattan line feature map lying on three orthogonal vanishing directions using the alignment method in Sec The encoder contains 7 convolution layers with kernel size of 3 3. Each convolution is followed by a ReLU operation and a max pooling layer with the down-sampling factor of 2. The first convolution contains 32 features, and we double size after each convolution. This deep structure ensures a better feature learning from high resolution images and help ease the decoding step. We tried Batch Normalization after each convolution layer but observe lower accuracy. We also explored an alternative structure that applies a separate encoder for the input image and the Manhattan lines, but observe no increase in performance compared to our current simpler design. 2D layout decoder: The decoder consists of two branches as shown in Fig. 2. The top branch, the layout boundary map (m E ) predictor, decodes the bottleneck feature into the 2D feature map with the same resolution as the input. m E is a 3-channel probability prediction of wall-wall, ceilingwall and wall-floor boundary on the panorama, for both visible and occluded boundaries. The boundary predictor contains 7 layers of nearest neighbor up-sampling operation, each followed by a convolution layer with kernel size of 3 3, and the feature size is halved through layers from The final layer is a Sigmoid operation. We add skip connections to each convolution layer following the spirit of the U-Net structure [26], in order to prevent shifting of predictions results from the up-sampling step. The lower branch, the 2D layout corner map (m C ) predictor, follows the same structure as the boundary map predictor and additionally receives skip connections from the top branch for each convolution layer. This stems from the intuition that layout boundaries imply corner positions, especially for the case when a corner is occluded. We show in our experiments (Sec. 4) that the joint prediction helps improve the accuracy of the both maps, leading to a better 3D reconstruction result. We experimented with fully convolutional layers [21] instead of the up-sampling plus convolutions structure, but observed worse performance with checkerboard artifacts. 3D layout regressor: The function to map from 2D corners and boundaries to 3D layout parameters is simple mathematically, but difficult to learn. So we train a regressor for 3D layout parameters with the purpose of producing better corners and boundaries, rather than for its own sake. As shown in Fig. 2, the 3D regressor gets as input the concatenation of the two predicted 2D maps and predicts the parameters of the 3D layout. We parameterize the layout with 6 parameters, assuming the ground plane is aligned on the x z axis: width s w, length s l, height s h, translation T = (t x, t z ) and rotation r θ on the x z plane. The regressor follows an encoder structure with 7 layers of convolution with kernel size 3 3, each followed by a ReLU operation and a max pooling layer with the down sampling factor of 2. The convolution feature size doubles through layers from the input 4 feature channel. The next four fully-connected layers have sizes of 1024, 256, 64, and 6, with ReLU in between. The output 1 6 feature vector d = {s w, s l, s h, t x, t z, r θ } is our predicted 3D cuboid parameter. Note that the regressor outputs the parameters of the 3D layout that can be projected back to the 2D image, presenting an end-to-end prediction approach. We observed that the 3D regressor is not accurate (with corner error of 3

4 3.36% in the dataset compared with other results in Table 1), but including it in the loss objective tends to slightly improve the predictions of the network. The direct 3D regressor fails due to the fact that small position shifts in 2D can have a large difference in the 3D shape, making the network hard to train. Loss function. The overall loss function of the network is in Eq. 1: L(m E, m C, d) = α 1 ) (ˆp log p + (1 ˆp) log(1 p) n p m E β 1 ) (ˆq log q + (1 ˆq) log(1 q) n q m C + τ d ˆd 2 (1) The loss is the summation over the binary cross entropy error of the predicted pixel probability in m E and m C compared to ground truth, plus the Euclidean distance of regressed 3D cuboid parameters d to the ground truth ˆd. p is the probability of one pixel in m E, and ˆp is the ground truth of p in m E. q is the pixel probability in m C, and ˆq is the ground truth. n is the number of pixels in m E and m C which is the image resolution. Note that the RoomNet approach [16] uses L2 loss for corner prediction. We discuss the performance using two different losses in Sec. 4. α, β and τ are the weights for each loss term. In our experiment, we set α = β = 1 and τ = Training details Our LayoutNet predicts pixel probabilities for corners and boundaries and regresses the 3D layout parameters. We find that joint training from a randomly initialized network sometimes fails to converge. Instead, we train each subnetwork separately and then jointly train them together. For the 2D layout prediction network, we first train on the layout boundary prediction task to initialize the parameters of the network. For the 3D layout regressor, we first train the network with ground truth layout boundaries and corners as input, and then connect it with the 2D layout decoder and train the whole network end-to-end. The input Manhattan line map is a 3 channel 0-1 tensor. We normalize each of the 3D cuboid parameter into zero mean and standard deviation across training samples. We use ADAM [15] to update network parameters with a learning rate of e 4, α = 0.95 and ɛ = e 6. The batch size for training the 2D layout prediction network is 5 and changes to 20 for training the 3D regressor. The whole end-to-end training uses a batch size of 20. Ground truth smoothing: Our target 2D boundary and corner map is a binary map with a thin curve or point on the image. This makes training more difficult. For example, if the network predicts the corner position slightly off the ground truth, a huge penalty will be incurred. Instead, we dilate the ground truth boundary and corner map with a factor of 4 and then smooth the image with a Gaussian kernel of Note that even after smoothing, the target image still contains 95% zero values, so we re-weight the back propagated gradients of the background pixels by multiplying with 0.2. Data augmentation: We use horizontal rotation, left-right flipping and luminance change to augment the training samples. The horizontal rotation varies from 0 o 360 o. The luminance varies with γ values between For perspective images, we apply ±10 rotation on the image plane. Algorithm 1 3D layout optimization 1: Given panorama I, layout corner prediction m C, and boundary prediction m E ; 2: Initialize 3D layout L 0 based on Eq. 2; 3: E best = Score(L 0 ) by Eq. 3, L best = L 0 ; 4: for i = 1 :wallnum do 5: Sample candidate layouts L i by varying wall position w i in 3D, fix other wall positions; 6: for j = 1 : L i do 7: Sample candidate Layouts L ij by varying floor and ceiling position in 3D; 8: Rank the best scored Layout L B {L ij } based on Eq. 3; 9: if E best < Score(L B ) then 10: E best = Score(L B ), L best = L B ; 11: Update w i from L best, fix it for following sampling return L best D layout optimization The initial 2D corner predictions are obtained from the corner probability maps that our network outputs. First, the responses are summed across rows, to get a summed response for each column. Then, local maxima are found in the column responses, with distance between local maxima of at least 20 pixels. Finally, the two largest peaks are found along the selected columns. These 2D corners might not satisfy Manhattan constraints, so we perform optimization to refine the estimates. Given the predicted corner positions, we can directly recover the camera position and 3D layout, up to a scale and translation, by assuming that bottom corners are on the same ground plane and that the top corners are directly above the bottom ones. We can further constrain the layout shape to be Manhattan, so that intersecting walls are perpendicular, e.g. like a cuboid or L -shape in a topdown view. For panoramic images, the Manhattan constraints can be easily incorporated, by utilizing the characteristic that the columns of the panorama correspond to rotation angles of the camera. We parameterize the layout coordinates in the top-down view as a vector of 2D points L v = {v 1 = (0, 0), v 2 = (x 1, y 1 ),..., v N = (x N, y N )}. 4

5 v 1 resolves the translation ambiguity, and v 1 v 2 = 1 sets the scale. Because the layout is assumed to be Manhattan, neighboring vertices will share one coordinate value, which further reduces the number of free parameters. We recover the camera position v c = {x c, y c } and L v based on the following generalized energy minimization inspired by Farin et al. [8]: E(L v, v c ) = min β(v i, v j ) α(v i, v j ) (2) v c,l v (i,j) L v where v i, v j are pairs of neighboring vertices, and β ij = arccos v i v c v j v c v i v c v j v c is the rotation angle of the camera v c between v i and v j. We denote α ij as the pixel-wise horizontal distance on the image between v i and v j divided by the length of the panorama. Note that this L2 minimization also applies to general Manhattan layouts. We use L- BFGS [35] to solve for Eq. 2 efficiently. We initialize the ceiling level as the average (mean) of 3D upper-corner heights, and then optimize for a better fitting room layout, relying on both corner and boundary information using the following score to evaluate 3D layout candidate L: Score(L) = w junc log P corner (l c ) l c C + w ceil l e L e max log P ceil (l e ) + w floor l f L f max log P floor (l f ) (3) where C denotes the 2D projected corner positions of L. Cardinality of L is #walls 2. We connect the nearby corners on the image to obtain L e which is the set of projected wall-ceiling boundaries, and L f which is the set of projected wall-floor boundaries (each with cardinality of #walls). P corner ( ) denotes the pixel-wise probability value on the predicted m C. P ceil ( ) and P floor ( ) denote the probability on m E. The 2nd and 3rd term take the maximum value of log likelihood response in each boundary l e L e and l f L f. w junc, w ceil and w floor are the term weights, we set to 1.0, 0.5 and 1.0 respectively using grid search. This weighting conforms with the observation that wallfloor corners are often occluded, and the predicted boundaries could help improve the layout reconstruction. We find that adding wall-wall boundaries in the scoring function helps less, since the vertical pairs of predicted corners already reveals the wall-wall boundaries information. Directly optimizing Eq. 3 is computationally expensive, since we penalize on 2D projections but not direct 3D properties. In this case, we instead sample candidate layout shapes and select the best scoring result based on Eq. 3. We use line search to prune the candidate numbers to speed up the optimization. Algorithm 1 demonstrates the procedure. In each step, we sample candidate layouts by shifting one of the wall position within ±%10 of its distance to the camera center. Each candidate s ceiling and floor level is then optimized based on the same sampling strategy and scored based on Eq. 3. Once we find the best scored layout by moving one of the walls, we fix this wall position, move to the next wall and perform the sampling again. We start from the least confident wall based on our boundary predictions. In total, 1000 layout candidates are sampled. The optimization step spends less then 30 sec for each image and produces better 3D layouts as demonstrated in Sec Extensions With small modifications, our network, originally designed to predict cuboid layouts from panoramas, can also predict more general Manhattan layouts from panoramas and cuboid-layouts from perspective images. General Manhattan layouts: To enable more general layouts, we include training examples that have more than four walls visible (e.g. L -shaped rooms), which applies to about 10% of examples. We then determine whether to generate four or six walls by thresholding the score of the sixth strongest wall-wall boundary. Specifically, the average probability along the sixth strongest column of the corner map is at least In other words, if there is evidence for more than four walls, our system generates additional walls; otherwise it generates four. Since the available test sets do not have many examples with more than four walls, we show qualitative results with our additional captured samples in Sec. 4.2 and in the supplemental material. Note that there will be multiple solutions given noncuboid layout when solving Eq. 2. We experimented with predicting a concave/convex label as part of the corner map prediction to obtain single solution, but observed degraded 2D prediction. We thus enumerate all possible shapes (e.g. for room with six walls, there will be six variations) and choose the one with the best score. We found this heuristic search to be efficient as it searches in a small discrete set. We do not train with the 3D parameter regressor for the non-cuboid layout. Perspective images: When predicting on perspective images, we skip the alignment and optimization steps, instead directly predicting corners and boundaries on the image. We also do not use the 3D regressor branch. The network predicts a 3-channel boundary layout map with ceiling-wall, wall-wall and wall-floor boundaries, and the corner map has eight channels for each possible corner. Since perspective images have smaller fields of view and the number of visible corners varies, we add a small decoding branch that predicts the room layout type, similar to RoomNet [16]. The predictor has 4 fully-connected (fc) layers with 1024, 256, 64 and 11 nodes, with ReLU operations in between. The predicted layout type then determines which corners are detected, and 5

6 Method 3D IoU (%) Corner error (%) Pixel error (%) [33] ours (corner) ours (corner+boundary) ours full (corner+boundary+3d) ours w/o alignment ours w/o cuboid constraint ours w/o layout optimization ours w/ L2 loss ours full w/ Stnfd. 2D-3D data Table 1. Quantitative results on cuboid layout estimation from panorama using dataset [33]. We compare the method, and include an ablation analysis on a variety of configurations of our method. Bold numbers indicate the best performance when training on data. Method Average CPU time (s) [33] > 300 ours full (corner+boundary+3d) ours w/o alignment ours w/o cuboid constraint ours w/o layout optimization Table 2. Average CPU time for each method. We evaluate the methods on the dataset [33] using Matlab on Linux machine with an Intel Xeon 3.5G Hz (6 cores). the corners are localized as the most probable positions in the corner maps. We use cross entropy loss to jointly train the layout boundary and corner predictors. To ease training, similar to the procedure in Sec. 3.3, we first train the boundary/corner predictors, and then add the type predictor branch and train all components together. 4. Experiments We implement our LayoutNet with Torch and test on a single NVIDIA Titan X GPU. The layout optimization is implemented with Matlab R2015a and is performed on Linux machine with Intel Xeon 3.5G Hz in CPU mode. We demonstrate the effectiveness of our approach on the following tasks: 1) predict 3D cuboid layout from a single panorama, 2) estimate 3D non-cuboid Manhattan layout from a single panorama, and 3) estimate layout from a single perspective image. We train only on the training split of each public dataset and tune the hyper-parameters on the validation set. We report results on the test set. Our final corner/boundary prediction from the LayoutNet is averaged over results with input of the original panoramas/images and the left-right flipped ones. Please find more results in the supplemental materials Cuboid layout for panorama We evaluate our approach on three standard metrics: 1. 3D Intersection over Union (IoU), calculated between our predicted 3D layout and the ground truth and averaged across all images; Method 3D IoU (%) Corner Pixel error (%) error (%) ours (corner) ours (corner+boundary) ours full (corner+boundary+3d) ours w/o alignment ours w/o cuboid constraint ours w/o layout optimization ours w/ L2 loss ours full w/ data Table 3. Evaluation on our labeled Stanford 2D-3D annotation dataset. We evaluate our LayoutNet approach with various configurations for ablation study. Bold numbers indicate best performance when training only on Stanford 2D-3D training set. 2. Corner error, the L2 distance between predicted room corner and the ground truth, normalized by the image diagonal and averaged across all images; 3. Pixel error, the pixel-wise accuracy between the layout and the ground truth, averaged across all images. We perform our method using the same hyper-parameter on the following two datasets. dataset: The dataset [33] contains 500 annotated cuboid layouts of indoor environments such as bedrooms and living rooms. Since there is no existing validation set, we carefully split 10% validation images from the training samples so that similar rooms do not appear in the training split. Table 1 shows the quantitative comparison of our method, denoted as ours full (corner+boundary+3d), compared with the state-ofthe-art cuboid layout estimation by Zhang et al. [33], denoted as. Note that incorporates object detection as a factor for layout estimation. Our LayoutNet directly recovers layouts and outperforms the state-of-the-art on all the three metrics. Figure 3 shows the qualitative comparison. Our approach presents better localization of layout boundaries, especially for a better estimate on occluded boundaries, and is much faster in time as shown in Table 2. Our labeled Stanford 2D-3D annotation dataset: The dataset contains 1413 equirectangular RGB panorama collected in 6 large-scale indoor environment including office and classrooms and open space like corridors. Since the dataset does not contain applicable layout annotations, we extend the annotations with carefully labeled 3D cuboid shape layout, providing 571 RGB panoramas with room layout annotations. We evaluate our LayoutNet quantitatively in Table 3 and qualitatively in Figure 4. Although the Stanford 2D-3D annotation dataset is more challenging with smaller vertical field of view (FOV) and more occlusions on the wall-floor boundaries, our LayoutNet recovers the 3D layouts well. Ablation study: We show, in Table 1 and Table 3, the performance given the different configurations of our approach: 1) with only room corner prediction, denoted as 6

7 LayoutNet LayoutNet Figure 3. Qualitative results (randomly sampled) for cuboid layout prediction on dataset [33]. We show both our method s performance (even columns) and the state-of-the-art [33] (odd columns). Each image consists predicted layout from given method (orange lines) and ground truth layout (green lines). Our method is very accurate on the pixel level, but as the IoU measure shows in our quantitative results, the 3D layout can be sensitive to even small 2D prediction errors. Best viewed in color. Figure 4. Qualitative results (randomly sampled) for cuboid layout prediction on the Stanford 2D-3D annotation dataset. This dataset is more challenging than the dataset, due to a smaller vertical field of view and more occlusion. We show our method s predicted layout (orange lines) compared with the ground truth layout (green lines). Best viewed in color. ours (corner) ; 2) joint prediction of corner and boundary, denoted as ours (corner+boundary) ; 3) our full approach with 3D layout loss, denoted as ours full (corner+boundary+3d) ; 4) our full approach trained on a combined dataset; 5) our full approach without alignment step; 6) our full approach without cuboid constraint; 7) our full approach without layout optimization step; and 8) our full approach using L2 loss for boundary/corner prediction instead of cross entropy loss. Our experiments show that the full approach that incorporates all configurations performs better across all the metrics. Using cross entropy loss appears to have a better performance than using L2. Training with 3D regressor has a small impact, which is the part of the reason we do not use it for perspective images. Table 2 Method L2 dist cosine dist Yang et al. [31] Table 4. Depth distribution error compared with Yang et al. [31]. shows the average runtimes for different configurations. Comparison to other approaches: We compare with Yang et al. based on their depth distribution metric. We directly run our full cuboid layout prediction (deep net trained on + optimization) on 88 indoor panoramas collected by Yang et al. As shown in Table 4, our approach outperforms Yang et al. in L2 distance and is slightly worse in cosine distance. Another approach, Pano2CAD [30], has not made their source code available and has no evaluation on layout, making direct comparison difficult. For time 7

8 Input RGB LayoutNet boundary LayoutNet corner LayoutNet result Input RGB LayoutNet boundary LayoutNet corner LayoutNet result Figure 5. Qualitative results for perspective images. We show the input RGB image, our predicted boundary/corner map and the final estimated layout (orange lines) compared with ground truth (green lines). Best viewed in color. Method Schwing et al. [27] Del Pero et al. [6] Dasgupta et al. [4] LayoutNet (ours) RoomNet recurrent 3-iter [16] Pixel Error (%) Table 5. Performance on Hedau dataset [11]. We show the top 5 results, LayoutNet ranks second to RoomNet recurrent 3-iter in Pixel Error (%). FPS) of RoomNet basic [16] or 168ms (6 FPS) of RoomNet recurrent, under the same hardware configuration. We report the result on LSUN dataset in the supplemental material. Figure 5 shows qualitative results on the LSUN validation split. Failure cases include room type prediction error (last row, right column) and heavy occlusion from limited field of view (last row, left column). Figure 6. Qualitative results for non-cuboid layout prediction. We show our method s predicted layout (orange lines) for noncuboid layouts such as L -shaped rooms. Best viewed in color. consumption, Yang et al. report to be less than 1 minute, Pano2CAD takes 30s to process one room. One forward pass of LayoutNet takes 39ms. In CPU mode (w/o parallel for loop) using Matlab R2015a, our cuboid constraint takes 0.52s, alignment 13.73s, and layout optimization 30.5s. 5. Conclusion We propose LayoutNet, an algorithm that predicts room layout from a single panorama or perspective image. Our approach relaxes the commonly assumed cuboid layout limitation and works well with non-cuboid layouts (e.g. L shape room). We demonstrate how pre-aligning based on vanishing points and Manhattan constraints substantially improve the quantitative results. Our method operates directly on panoramic images (rather than decomposing into perspective images) and is among the state-of-the-art for the perspective image task. Future work includes extending to handle arbitrary room layouts, incorporating object detection for better estimating room shapes, and recovering a complete 3D indoor model recovered from single images Non-cuboid layout for panorama Figure 6 shows qualitative results of our approach to reconstruct non-cuboid Manhattan layouts from single panorama. Due to the limited number of non-cuboid room layouts in the existing datasets, we captured several images using a Ricoh Theta-S 360 camera. Our approach is able to predict 3D room layouts with complex shape that are difficult for existing methods Perspective images We use the same experimental setting as in [4, 16]. We train our modified approach to jointly predict room type on the training split of the LSUN layout estimation challenge. We do not train on the validation split. Table 5 shows our performance compared with the stateof-the-art on Hedau s dataset [11]. Our method ranks second among the methods. Our method takes 39ms (25 FPS) to process a perspective image, faster than the 52ms (19 Acknowledgements This research is supported in part by NSF award , ONR MURI grant N , and Zillow Group. We thank Zongyi Wang for his invaluable help with panorama annotation. 8

9 References [1] I. Armeni, S. Sax, A. R. Zamir, and S. Savarese. Joint 2d-3d-semantic data for indoor scene understanding. arxiv: , [2] R. Cabral and Y. Furukawa. Piecewise planar and compact floorplan reconstruction from images. In CVPR, pages , [3] J. M. Coughlan and A. L. Yuille. Manhattan world: Compass direction from a single image by bayesian inference. In ICCV, volume 2, pages IEEE, [4] S. Dasgupta, K. Fang, K. Chen, and S. Savarese. Delay: Robust spatial layout estimation for cluttered indoor scenes. In CVPR, pages , [5] L. Del Pero, J. Bowdish, D. Fried, B. Kermgard, E. Hartley, and K. Barnard. Bayesian geometric modeling of indoor scenes. In CVPR, pages , [6] L. Del Pero, J. Bowdish, B. Kermgard, E. Hartley, and K. Barnard. Understanding bayesian rooms using composite 3d object models. In CVPR, pages , [7] E. Delage, H. Lee, and A. Y. Ng. A dynamic bayesian network model for autonomous 3d reconstruction from a single indoor image. In CVPR, volume 2, pages IEEE, [8] D. Farin, W. Effelsberg, et al. Floor-plan reconstruction from panoramic images. In ACM Multimedia, pages ACM, [9] P. V. Group. LSUN challenge on room layout. leaderboard/index_2016.html#roomlayout. [10] R. Guo, C. Zou, and D. Hoiem. Predicting complete 3d models of indoor scenes. arxiv: , [11] V. Hedau, D. Hoiem, and D. Forsyth. Recovering the spatial layout of cluttered rooms. In ICCV, [12] V. Hedau, D. Hoiem, and D. Forsyth. Thinking inside the box: Using appearance models and context based on room geometry. ECCV, pages , [13] D. Hoiem, A. A. Efros, and M. Hebert. Geometric context from a single image. In ICCV, volume 1, pages IEEE, [14] H. Izadinia, Q. Shan, and S. M. Seitz. IM2CAD. In CVPR, [15] D. P. Kingma and J. Ba. Adam: A method for stochastic optimization. CoRR, [16] C.-Y. Lee, V. Badrinarayanan, T. Malisiewicz, and A. Rabinovich. Roomnet: End-to-end room layout estimation. arxiv: , [17] D. Lee, A. Gupta, M. Hebert, and T. Kanade. Estimating spatial layout of rooms using volumetric reasoning about objects and surfaces. In NIPS, pages , [18] D. C. Lee, M. Hebert, and T. Kanade. Geometric reasoning for single image structure recovery. In CVPR, pages IEEE, [19] C. Liu, P. Kohli, and Y. Furukawa. Layered scene decomposition via the occlusion-crf. In CVPR, pages , [20] C. Liu, A. G. Schwing, K. Kundu, R. Urtasun, and S. Fidler. Rent3d: Floor-plan priors for monocular layout estimation. In CVPR, pages , [21] J. Long, E. Shelhamer, and T. Darrell. Fully convolutional networks for semantic segmentation. In CVPR, [22] A. Mallya and S. Lazebnik. Learning informative edge maps for indoor scene layout prediction. In ICCV, pages , [23] S. Ramalingam, J. K. Pillai, A. Jain, and Y. Taguchi. Manhattan junction catalogue for spatial reasoning of indoor scenes. In CVPR, pages , [24] G. Randall, J. Jakubowicz, R. G. von Gioi, and J.-M. Morel. Lsd: A fast line segment detector with a false detection control. IEEE Transactions on Pattern Analysis & Machine Intelligence, 32: , [25] Y. Ren, C. Chen, S. Li, and C. J. Kuo. A coarse-to-fine indoor layout estimation (CFILE) method. arxiv: , [26] O. Ronneberger, P. Fischer, and T. Brox. U-Net: Convolutional networks for biomedical image segmentation. In MIC- CAI, volume 9351 of LNCS, pages Springer, [27] A. G. Schwing, T. Hazan, M. Pollefeys, and R. Urtasun. Efficient structured prediction for 3d indoor scene understanding. In CVPR, pages , [28] A. G. Schwing and R. Urtasun. Efficient exact inference for 3d indoor scene understanding. In ECCV, pages , [29] N. Silberman, D. Hoiem, P. Kohli, and R. Fergus. Indoor segmentation and support inference from rgbd images. In ECCV, [30] J. Xu, B. Stenger, T. Kerola, and T. Tung. Pano2CAD: Room layout from a single panorama image. WACV, pages , [31] H. Yang and H. Zhang. Efficient 3d room shape recovery from a single panorama. In CVPR, [32] J. Zhang, C. Kan, A. G. Schwing, and R. Urtasun. Estimating the 3d layout of indoor scenes and its clutter from depth sensors. In ICCV, pages , [33] Y. Zhang, S. Song, P. Tan, and J. Xiao. Panocontext: A whole-room 3d context model for panoramic scene understanding. In ECCV, pages , [34] Y. Zhao and S.-C. Zhu. Scene parsing by integrating function, geometry and appearance models. In CVPR, pages , [35] C. Zhu, R. H. Byrd, P. Lu, and J. Nocedal. Algorithm 778: L-bfgs-b: Fortran subroutines for large-scale boundconstrained optimization. ACM Transactions on Mathematical Software (TOMS), 23(4): ,

10 A. Quantitative Results on LSUN layout Challenge [9] Table 6 shows our performance compared with the stateof-the-art on the LSUN dataset [9]. Our method ranks second in Keypoint Error (%) and ranks third in Pixel Error (%) among the methods. We also report results of the Room- Net basic approach [16] that does not apply recurrent refinement, which is closer in design to our approach. The lower accuracy in pixel error mainly results from our simplified room keypoint representation. Different from RoomNet [16] that assumes all keypoints are distinguished across different room types, our LayoutNet directly predicts the 8 keypoints, and selects among them based on the room type to produce the final prediction. Applying the layout optimization step as explained in the paper could possibly further enhance our performance on the perspective image task. Method Keypoint Error (%) Pixel Error (%) Hedau et al. [11] Mallya et al. [22] Dasgupta et al. [4] LayoutNet (ours) RoomNet recurrent 3-iter [16] RoomNet basic [16] Table 6. Performance on LSUN dataset [9]. LayoutNet ranks second to RoomNet recurrent 3-iter in Keypoint Error (%) and ranks third in Pixel Error (%). We also report the RoomNet basic approach that does not apply recurrent refinement step. B. More Qualitative Results B.1. Non-cuboid layout from panorama We show more qualitative results of non-cuboid room layout reconstruction from single panorama as in Figure 7. We use samples from the dataset collected by Yang et al. [31]. We exclude samples that overlap with the dataset [33]. B.2. Cuboid layout from panorama We show more qualitative results in dataset [33] in Figure 8 and Figure 9. We compare our method with the state-of-the-art. We show more qualitative results in our labeled Stanford 2D-3D annotation dataset compared with our ground truth annotation, as shown in Figure 10 and Figure 11. B.3. Perspective images We show more qualitative results on the LSUN layout Challenge [9] compared with the ground truth annotation, as shown in Figure

11 Figure 7. Qualitative results for non-cuboid layout prediction. We show our method s predicted layout (orange lines) for non-cuboid layouts such as L -shaped rooms. Best viewed in color. 11

12 LayoutNet LayoutNet Figure 8. Qualitative results for cuboid layout prediction on dataset [33]. We show both our method s performance (even columns) and the state-of-the-art [33] (odd columns). Each image consists predicted layout from given method (orange lines) and ground truth layout (green lines). Best viewed in color. 12

13 LayoutNet LayoutNet Figure 9. Qualitative results for cuboid layout prediction on dataset [33]. We show both our method s performance (even columns) and the state-of-the-art [33] (odd columns). Each image consists predicted layout from given method (orange lines) and ground truth layout (green lines). Best viewed in color. 13

14 Figure 10. Qualitative results (randomly sampled) for cuboid layout prediction on the Stanford 2D-3D annotation dataset. This dataset is more challenging than the dataset, due to a smaller vertical field of view and more occlusion. We show our method s predicted layout (orange lines) compared with the ground truth layout (green lines). Best viewed in color. 14

15 Figure 11. Qualitative results (randomly sampled) for cuboid layout prediction on the Stanford 2D-3D annotation dataset. This dataset is more challenging than the dataset, due to a smaller vertical field of view and more occlusion. We show our method s predicted layout (orange lines) compared with the ground truth layout (green lines). Best viewed in color. 15

16 Input RGB LayoutNet boundary LayoutNet corner LayoutNet result Input RGB LayoutNet boundary LayoutNet corner LayoutNet result Figure 12. Qualitative results for perspective images. We show the input RGB image, our predicted boundary/corner map and the final estimated layout (orange lines) compared with ground truth (green lines). Best viewed in color. 16

LayoutNet: Reconstructing the 3D Room Layout from a Single RGB Image

LayoutNet: Reconstructing the 3D Room Layout from a Single RGB Image LayoutNet: Reconstructing the 3D Room Layout from a Single RGB Image Chuhang Zou Alex Colburn Qi Shan Derek Hoiem University of Illinois at Urbana-Champaign Zillow Group {czou4, dhoiem}@illinois.edu {alexco,

More information

arxiv: v1 [cs.cv] 3 Jul 2016

arxiv: v1 [cs.cv] 3 Jul 2016 A Coarse-to-Fine Indoor Layout Estimation (CFILE) Method Yuzhuo Ren, Chen Chen, Shangwen Li, and C.-C. Jay Kuo arxiv:1607.00598v1 [cs.cv] 3 Jul 2016 Abstract. The task of estimating the spatial layout

More information

Holistic 3D Scene Parsing and Reconstruction from a Single RGB Image. Supplementary Material

Holistic 3D Scene Parsing and Reconstruction from a Single RGB Image. Supplementary Material Holistic 3D Scene Parsing and Reconstruction from a Single RGB Image Supplementary Material Siyuan Huang 1,2, Siyuan Qi 1,2, Yixin Zhu 1,2, Yinxue Xiao 1, Yuanlu Xu 1,2, and Song-Chun Zhu 1,2 1 University

More information

Three-Dimensional Object Detection and Layout Prediction using Clouds of Oriented Gradients

Three-Dimensional Object Detection and Layout Prediction using Clouds of Oriented Gradients ThreeDimensional Object Detection and Layout Prediction using Clouds of Oriented Gradients Authors: Zhile Ren, Erik B. Sudderth Presented by: Shannon Kao, Max Wang October 19, 2016 Introduction Given an

More information

Edge-Semantic Learning Strategy for Layout Estimation in Indoor Environment

Edge-Semantic Learning Strategy for Layout Estimation in Indoor Environment 1 Edge-Semantic Learning Strategy for Layout Estimation in Indoor Environment Weidong Zhang, Student Member, IEEE, Wei Zhang, Member, IEEE, and Jason Gu, Senior Member, IEEE arxiv:1901.00621v1 [cs.cv]

More information

3D Spatial Layout Propagation in a Video Sequence

3D Spatial Layout Propagation in a Video Sequence 3D Spatial Layout Propagation in a Video Sequence Alejandro Rituerto 1, Roberto Manduchi 2, Ana C. Murillo 1 and J. J. Guerrero 1 arituerto@unizar.es, manduchi@soe.ucsc.edu, acm@unizar.es, and josechu.guerrero@unizar.es

More information

DeLay: Robust Spatial Layout Estimation for Cluttered Indoor Scenes

DeLay: Robust Spatial Layout Estimation for Cluttered Indoor Scenes DeLay: Robust Spatial Layout Estimation for Cluttered Indoor Scenes Saumitro Dasgupta, Kuan Fang, Kevin Chen, Silvio Savarese Stanford University {sd, kuanfang, kchen92}@cs.stanford.edu, ssilvio@stanford.edu

More information

Contexts and 3D Scenes

Contexts and 3D Scenes Contexts and 3D Scenes Computer Vision Jia-Bin Huang, Virginia Tech Many slides from D. Hoiem Administrative stuffs Final project presentation Nov 30 th 3:30 PM 4:45 PM Grading Three senior graders (30%)

More information

Contexts and 3D Scenes

Contexts and 3D Scenes Contexts and 3D Scenes Computer Vision Jia-Bin Huang, Virginia Tech Many slides from D. Hoiem Administrative stuffs Final project presentation Dec 1 st 3:30 PM 4:45 PM Goodwin Hall Atrium Grading Three

More information

Separating Objects and Clutter in Indoor Scenes

Separating Objects and Clutter in Indoor Scenes Separating Objects and Clutter in Indoor Scenes Salman H. Khan School of Computer Science & Software Engineering, The University of Western Australia Co-authors: Xuming He, Mohammed Bennamoun, Ferdous

More information

Room Reconstruction from a Single Spherical Image by Higher-order Energy Minimization

Room Reconstruction from a Single Spherical Image by Higher-order Energy Minimization Room Reconstruction from a Single Spherical Image by Higher-order Energy Minimization Kosuke Fukano, Yoshihiko Mochizuki, Satoshi Iizuka, Edgar Simo-Serra, Akihiro Sugimoto, and Hiroshi Ishikawa Waseda

More information

Support surfaces prediction for indoor scene understanding

Support surfaces prediction for indoor scene understanding 2013 IEEE International Conference on Computer Vision Support surfaces prediction for indoor scene understanding Anonymous ICCV submission Paper ID 1506 Abstract In this paper, we present an approach to

More information

Unfolding an Indoor Origami World

Unfolding an Indoor Origami World Unfolding an Indoor Origami World David F. Fouhey, Abhinav Gupta, and Martial Hebert The Robotics Institute, Carnegie Mellon University Abstract. In this work, we present a method for single-view reasoning

More information

Pano2CAD: Room Layout From A Single Panorama Image

Pano2CAD: Room Layout From A Single Panorama Image Pano2CAD: Room Layout From A Single Panorama Image Jiu Xu1 Bjo rn Stenger1 Tommi Kerola Tony Tung2 1 Rakuten Institute of Technology 2 Facebook Abstract This paper presents a method of estimating the geometry

More information

Joint Vanishing Point Extraction and Tracking. 9. June 2015 CVPR 2015 Till Kroeger, Dengxin Dai, Luc Van Gool, Computer Vision ETH Zürich

Joint Vanishing Point Extraction and Tracking. 9. June 2015 CVPR 2015 Till Kroeger, Dengxin Dai, Luc Van Gool, Computer Vision ETH Zürich Joint Vanishing Point Extraction and Tracking 9. June 2015 CVPR 2015 Till Kroeger, Dengxin Dai, Luc Van Gool, Computer Vision Lab @ ETH Zürich Definition: Vanishing Point = Intersection of 2D line segments,

More information

Learning Informative Edge Maps for Indoor Scene Layout Prediction

Learning Informative Edge Maps for Indoor Scene Layout Prediction Learning Informative Edge Maps for Indoor Scene Layout Prediction Arun Mallya and Svetlana Lazebnik Dept. of Computer Science, University of Illinois at Urbana-Champaign {amallya2,slazebni}@illinois.edu

More information

RoomNet: End-to-End Room Layout Estimation

RoomNet: End-to-End Room Layout Estimation RoomNet: End-to-End Room Layout Estimation Chen-Yu Lee Vijay Badrinarayanan Tomasz Malisiewicz Andrew Rabinovich Magic Leap, Inc. {clee, vbadrinarayanan, tmalisiewicz, arabinovich}@magicleap.com Abstract

More information

RoomNet: End-to-End Room Layout Estimation

RoomNet: End-to-End Room Layout Estimation RoomNet: End-to-End Room Layout Estimation Chen-Yu Lee Vijay Badrinarayanan Tomasz Malisiewicz Andrew Rabinovich Magic Leap, Inc. {clee, vbadrinarayanan, tmalisiewicz, arabinovich}@magicleap.com extracted

More information

ECCV Presented by: Boris Ivanovic and Yolanda Wang CS 331B - November 16, 2016

ECCV Presented by: Boris Ivanovic and Yolanda Wang CS 331B - November 16, 2016 ECCV 2016 Presented by: Boris Ivanovic and Yolanda Wang CS 331B - November 16, 2016 Fundamental Question What is a good vector representation of an object? Something that can be easily predicted from 2D

More information

Mask R-CNN. presented by Jiageng Zhang, Jingyao Zhan, Yunhan Ma

Mask R-CNN. presented by Jiageng Zhang, Jingyao Zhan, Yunhan Ma Mask R-CNN presented by Jiageng Zhang, Jingyao Zhan, Yunhan Ma Mask R-CNN Background Related Work Architecture Experiment Mask R-CNN Background Related Work Architecture Experiment Background From left

More information

Finding Tiny Faces Supplementary Materials

Finding Tiny Faces Supplementary Materials Finding Tiny Faces Supplementary Materials Peiyun Hu, Deva Ramanan Robotics Institute Carnegie Mellon University {peiyunh,deva}@cs.cmu.edu 1. Error analysis Quantitative analysis We plot the distribution

More information

CS395T paper review. Indoor Segmentation and Support Inference from RGBD Images. Chao Jia Sep

CS395T paper review. Indoor Segmentation and Support Inference from RGBD Images. Chao Jia Sep CS395T paper review Indoor Segmentation and Support Inference from RGBD Images Chao Jia Sep 28 2012 Introduction What do we want -- Indoor scene parsing Segmentation and labeling Support relationships

More information

Learning and Inferring Depth from Monocular Images. Jiyan Pan April 1, 2009

Learning and Inferring Depth from Monocular Images. Jiyan Pan April 1, 2009 Learning and Inferring Depth from Monocular Images Jiyan Pan April 1, 2009 Traditional ways of inferring depth Binocular disparity Structure from motion Defocus Given a single monocular image, how to infer

More information

EDGE BASED 3D INDOOR CORRIDOR MODELING USING A SINGLE IMAGE

EDGE BASED 3D INDOOR CORRIDOR MODELING USING A SINGLE IMAGE EDGE BASED 3D INDOOR CORRIDOR MODELING USING A SINGLE IMAGE Ali Baligh Jahromi and Gunho Sohn GeoICT Laboratory, Department of Earth, Space Science and Engineering, York University, 4700 Keele Street,

More information

DeepIM: Deep Iterative Matching for 6D Pose Estimation - Supplementary Material

DeepIM: Deep Iterative Matching for 6D Pose Estimation - Supplementary Material DeepIM: Deep Iterative Matching for 6D Pose Estimation - Supplementary Material Yi Li 1, Gu Wang 1, Xiangyang Ji 1, Yu Xiang 2, and Dieter Fox 2 1 Tsinghua University, BNRist 2 University of Washington

More information

Focusing Attention on Visual Features that Matter

Focusing Attention on Visual Features that Matter TSAI, KUIPERS: FOCUSING ATTENTION ON VISUAL FEATURES THAT MATTER 1 Focusing Attention on Visual Features that Matter Grace Tsai gstsai@umich.edu Benjamin Kuipers kuipers@umich.edu Electrical Engineering

More information

Visual Recognition: Examples of Graphical Models

Visual Recognition: Examples of Graphical Models Visual Recognition: Examples of Graphical Models Raquel Urtasun TTI Chicago March 6, 2012 Raquel Urtasun (TTI-C) Visual Recognition March 6, 2012 1 / 64 Graphical models Applications Representation Inference

More information

CS 558: Computer Vision 13 th Set of Notes

CS 558: Computer Vision 13 th Set of Notes CS 558: Computer Vision 13 th Set of Notes Instructor: Philippos Mordohai Webpage: www.cs.stevens.edu/~mordohai E-mail: Philippos.Mordohai@stevens.edu Office: Lieb 215 Overview Context and Spatial Layout

More information

Correcting User Guided Image Segmentation

Correcting User Guided Image Segmentation Correcting User Guided Image Segmentation Garrett Bernstein (gsb29) Karen Ho (ksh33) Advanced Machine Learning: CS 6780 Abstract We tackle the problem of segmenting an image into planes given user input.

More information

Supplementary Material: Piecewise Planar and Compact Floorplan Reconstruction from Images

Supplementary Material: Piecewise Planar and Compact Floorplan Reconstruction from Images Supplementary Material: Piecewise Planar and Compact Floorplan Reconstruction from Images Ricardo Cabral Carnegie Mellon University rscabral@cmu.edu Yasutaka Furukawa Washington University in St. Louis

More information

arxiv: v1 [cs.cv] 29 Sep 2016

arxiv: v1 [cs.cv] 29 Sep 2016 arxiv:1609.09545v1 [cs.cv] 29 Sep 2016 Two-stage Convolutional Part Heatmap Regression for the 1st 3D Face Alignment in the Wild (3DFAW) Challenge Adrian Bulat and Georgios Tzimiropoulos Computer Vision

More information

Multi-View 3D Object Detection Network for Autonomous Driving

Multi-View 3D Object Detection Network for Autonomous Driving Multi-View 3D Object Detection Network for Autonomous Driving Xiaozhi Chen, Huimin Ma, Ji Wan, Bo Li, Tian Xia CVPR 2017 (Spotlight) Presented By: Jason Ku Overview Motivation Dataset Network Architecture

More information

Single-view 3D Reconstruction

Single-view 3D Reconstruction Single-view 3D Reconstruction 10/12/17 Computational Photography Derek Hoiem, University of Illinois Some slides from Alyosha Efros, Steve Seitz Notes about Project 4 (Image-based Lighting) You can work

More information

CS381V Experiment Presentation. Chun-Chen Kuo

CS381V Experiment Presentation. Chun-Chen Kuo CS381V Experiment Presentation Chun-Chen Kuo The Paper Indoor Segmentation and Support Inference from RGBD Images. N. Silberman, D. Hoiem, P. Kohli, and R. Fergus. ECCV 2012. 50 100 150 200 250 300 350

More information

Learning from 3D Data

Learning from 3D Data Learning from 3D Data Thomas Funkhouser Princeton University* * On sabbatical at Stanford and Google Disclaimer: I am talking about the work of these people Shuran Song Andy Zeng Fisher Yu Yinda Zhang

More information

arxiv: v2 [cs.cv] 24 Apr 2017

arxiv: v2 [cs.cv] 24 Apr 2017 IM2CAD Hamid Izadinia University of Washington Qi Shan Zillow Group Steven M. Seitz University of Washington arxiv:1608.05137v2 [cs.cv] 24 Apr 2017 Figure 1: IM2CAD takes a single photo of a real scene

More information

Real-Time Depth Estimation from 2D Images

Real-Time Depth Estimation from 2D Images Real-Time Depth Estimation from 2D Images Jack Zhu Ralph Ma jackzhu@stanford.edu ralphma@stanford.edu. Abstract ages. We explore the differences in training on an untrained network, and on a network pre-trained

More information

3D Object Recognition and Scene Understanding from RGB-D Videos. Yu Xiang Postdoctoral Researcher University of Washington

3D Object Recognition and Scene Understanding from RGB-D Videos. Yu Xiang Postdoctoral Researcher University of Washington 3D Object Recognition and Scene Understanding from RGB-D Videos Yu Xiang Postdoctoral Researcher University of Washington 1 2 Act in the 3D World Sensing & Understanding Acting Intelligent System 3D World

More information

Recovering the Spatial Layout of Cluttered Rooms

Recovering the Spatial Layout of Cluttered Rooms Recovering the Spatial Layout of Cluttered Rooms Varsha Hedau Electical and Computer Engg. Department University of Illinois at Urbana Champaign vhedau2@uiuc.edu Derek Hoiem, David Forsyth Computer Science

More information

Perceiving the 3D World from Images and Videos. Yu Xiang Postdoctoral Researcher University of Washington

Perceiving the 3D World from Images and Videos. Yu Xiang Postdoctoral Researcher University of Washington Perceiving the 3D World from Images and Videos Yu Xiang Postdoctoral Researcher University of Washington 1 2 Act in the 3D World Sensing & Understanding Acting Intelligent System 3D World 3 Understand

More information

Predicting Depth, Surface Normals and Semantic Labels with a Common Multi-Scale Convolutional Architecture David Eigen, Rob Fergus

Predicting Depth, Surface Normals and Semantic Labels with a Common Multi-Scale Convolutional Architecture David Eigen, Rob Fergus Predicting Depth, Surface Normals and Semantic Labels with a Common Multi-Scale Convolutional Architecture David Eigen, Rob Fergus Presented by: Rex Ying and Charles Qi Input: A Single RGB Image Estimate

More information

Depth Estimation from a Single Image Using a Deep Neural Network Milestone Report

Depth Estimation from a Single Image Using a Deep Neural Network Milestone Report Figure 1: The architecture of the convolutional network. Input: a single view image; Output: a depth map. 3 Related Work In [4] they used depth maps of indoor scenes produced by a Microsoft Kinect to successfully

More information

Object Detection by 3D Aspectlets and Occlusion Reasoning

Object Detection by 3D Aspectlets and Occlusion Reasoning Object Detection by 3D Aspectlets and Occlusion Reasoning Yu Xiang University of Michigan Silvio Savarese Stanford University In the 4th International IEEE Workshop on 3D Representation and Recognition

More information

Imagining the Unseen: Stability-based Cuboid Arrangements for Scene Understanding

Imagining the Unseen: Stability-based Cuboid Arrangements for Scene Understanding : Stability-based Cuboid Arrangements for Scene Understanding Tianjia Shao* Aron Monszpart Youyi Zheng Bongjin Koo Weiwei Xu Kun Zhou * Niloy J. Mitra * Background A fundamental problem for single view

More information

arxiv: v1 [cs.cv] 31 Mar 2016

arxiv: v1 [cs.cv] 31 Mar 2016 Object Boundary Guided Semantic Segmentation Qin Huang, Chunyang Xia, Wenchao Zheng, Yuhang Song, Hao Xu and C.-C. Jay Kuo arxiv:1603.09742v1 [cs.cv] 31 Mar 2016 University of Southern California Abstract.

More information

Detecting and Parsing of Visual Objects: Humans and Animals. Alan Yuille (UCLA)

Detecting and Parsing of Visual Objects: Humans and Animals. Alan Yuille (UCLA) Detecting and Parsing of Visual Objects: Humans and Animals Alan Yuille (UCLA) Summary This talk describes recent work on detection and parsing visual objects. The methods represent objects in terms of

More information

LSTM and its variants for visual recognition. Xiaodan Liang Sun Yat-sen University

LSTM and its variants for visual recognition. Xiaodan Liang Sun Yat-sen University LSTM and its variants for visual recognition Xiaodan Liang xdliang328@gmail.com Sun Yat-sen University Outline Context Modelling with CNN LSTM and its Variants LSTM Architecture Variants Application in

More information

DeepLab: Semantic Image Segmentation with Deep Convolutional Nets, Atrous Convolution and Fully Connected CRFs

DeepLab: Semantic Image Segmentation with Deep Convolutional Nets, Atrous Convolution and Fully Connected CRFs DeepLab: Semantic Image Segmentation with Deep Convolutional Nets, Atrous Convolution and Fully Connected CRFs Zhipeng Yan, Moyuan Huang, Hao Jiang 5/1/2017 1 Outline Background semantic segmentation Objective,

More information

arxiv: v3 [cs.cv] 18 Aug 2017

arxiv: v3 [cs.cv] 18 Aug 2017 Predicting Complete 3D Models of Indoor Scenes Ruiqi Guo UIUC, Google Chuhang Zou UIUC Derek Hoiem UIUC arxiv:1504.02437v3 [cs.cv] 18 Aug 2017 Abstract One major goal of vision is to infer physical models

More information

Sparse Point Cloud Densification by Using Redundant Semantic Information

Sparse Point Cloud Densification by Using Redundant Semantic Information Sparse Point Cloud Densification by Using Redundant Semantic Information Michael Hödlmoser CVL, Vienna University of Technology ken@caa.tuwien.ac.at Branislav Micusik AIT Austrian Institute of Technology

More information

PanoContext: A Whole-room 3D Context Model for Panoramic Scene Understanding

PanoContext: A Whole-room 3D Context Model for Panoramic Scene Understanding PanoContext: A Whole-room 3D Context Model for Panoramic Scene Understanding Yinda Zhang Shuran Song Ping Tan Jianxiong Xiao Princeton University Simon Fraser University Alicia Clark PanoContext October

More information

Estimating the 3D Layout of Indoor Scenes and its Clutter from Depth Sensors

Estimating the 3D Layout of Indoor Scenes and its Clutter from Depth Sensors Estimating the 3D Layout of Indoor Scenes and its Clutter from Depth Sensors Jian Zhang Tsingua University jizhang@ethz.ch Chen Kan Tsingua University chenkan0007@gmail.com Alexander G. Schwing ETH Zurich

More information

A Keypoint Descriptor Inspired by Retinal Computation

A Keypoint Descriptor Inspired by Retinal Computation A Keypoint Descriptor Inspired by Retinal Computation Bongsoo Suh, Sungjoon Choi, Han Lee Stanford University {bssuh,sungjoonchoi,hanlee}@stanford.edu Abstract. The main goal of our project is to implement

More information

Colored Point Cloud Registration Revisited Supplementary Material

Colored Point Cloud Registration Revisited Supplementary Material Colored Point Cloud Registration Revisited Supplementary Material Jaesik Park Qian-Yi Zhou Vladlen Koltun Intel Labs A. RGB-D Image Alignment Section introduced a joint photometric and geometric objective

More information

Deep Supervision with Shape Concepts for Occlusion-Aware 3D Object Parsing

Deep Supervision with Shape Concepts for Occlusion-Aware 3D Object Parsing Deep Supervision with Shape Concepts for Occlusion-Aware 3D Object Parsing Supplementary Material Introduction In this supplementary material, Section 2 details the 3D annotation for CAD models and real

More information

Automatic Photo Popup

Automatic Photo Popup Automatic Photo Popup Derek Hoiem Alexei A. Efros Martial Hebert Carnegie Mellon University What Is Automatic Photo Popup Introduction Creating 3D models from images is a complex process Time-consuming

More information

Real-time Object Detection CS 229 Course Project

Real-time Object Detection CS 229 Course Project Real-time Object Detection CS 229 Course Project Zibo Gong 1, Tianchang He 1, and Ziyi Yang 1 1 Department of Electrical Engineering, Stanford University December 17, 2016 Abstract Objection detection

More information

Disguised Face Identification (DFI) with Facial KeyPoints using Spatial Fusion Convolutional Network. Nathan Sun CIS601

Disguised Face Identification (DFI) with Facial KeyPoints using Spatial Fusion Convolutional Network. Nathan Sun CIS601 Disguised Face Identification (DFI) with Facial KeyPoints using Spatial Fusion Convolutional Network Nathan Sun CIS601 Introduction Face ID is complicated by alterations to an individual s appearance Beard,

More information

Data-driven Depth Inference from a Single Still Image

Data-driven Depth Inference from a Single Still Image Data-driven Depth Inference from a Single Still Image Kyunghee Kim Computer Science Department Stanford University kyunghee.kim@stanford.edu Abstract Given an indoor image, how to recover its depth information

More information

Robust Manhattan Frame Estimation from a Single RGB-D Image

Robust Manhattan Frame Estimation from a Single RGB-D Image Robust Manhattan Frame Estimation from a Single RGB-D Image Bernard Ghanem 1, Ali Thabet 1, Juan Carlos Niebles 2, and Fabian Caba Heilbron 1 1 King Abdullah University of Science and Technology (KAUST),

More information

Lecture 7: Semantic Segmentation

Lecture 7: Semantic Segmentation Semantic Segmentation CSED703R: Deep Learning for Visual Recognition (207F) Segmenting images based on its semantic notion Lecture 7: Semantic Segmentation Bohyung Han Computer Vision Lab. bhhanpostech.ac.kr

More information

Content-Based Image Recovery

Content-Based Image Recovery Content-Based Image Recovery Hong-Yu Zhou and Jianxin Wu National Key Laboratory for Novel Software Technology Nanjing University, China zhouhy@lamda.nju.edu.cn wujx2001@nju.edu.cn Abstract. We propose

More information

Understanding Bayesian rooms using composite 3D object models

Understanding Bayesian rooms using composite 3D object models 2013 IEEE Conference on Computer Vision and Pattern Recognition Understanding Bayesian rooms using composite 3D object models Luca Del Pero Joshua Bowdish Bonnie Kermgard Emily Hartley Kobus Barnard University

More information

IDE-3D: Predicting Indoor Depth Utilizing Geometric and Monocular Cues

IDE-3D: Predicting Indoor Depth Utilizing Geometric and Monocular Cues 2016 International Conference on Computational Science and Computational Intelligence IDE-3D: Predicting Indoor Depth Utilizing Geometric and Monocular Cues Taylor Ripke Department of Computer Science

More information

3D layout propagation to improve object recognition in egocentric videos

3D layout propagation to improve object recognition in egocentric videos 3D layout propagation to improve object recognition in egocentric videos Alejandro Rituerto, Ana C. Murillo and José J. Guerrero {arituerto,acm,josechu.guerrero}@unizar.es Instituto de Investigación en

More information

Multi-stable Perception. Necker Cube

Multi-stable Perception. Necker Cube Multi-stable Perception Necker Cube Spinning dancer illusion, Nobuyuki Kayahara Multiple view geometry Stereo vision Epipolar geometry Lowe Hartley and Zisserman Depth map extraction Essential matrix

More information

3D-Based Reasoning with Blocks, Support, and Stability

3D-Based Reasoning with Blocks, Support, and Stability 3D-Based Reasoning with Blocks, Support, and Stability Zhaoyin Jia, Andrew Gallagher, Ashutosh Saxena, Tsuhan Chen School of Electrical and Computer Engineering, Cornell University. Department of Computer

More information

arxiv: v1 [cs.cv] 2 Sep 2018

arxiv: v1 [cs.cv] 2 Sep 2018 Natural Language Person Search Using Deep Reinforcement Learning Ankit Shah Language Technologies Institute Carnegie Mellon University aps1@andrew.cmu.edu Tyler Vuong Electrical and Computer Engineering

More information

Learning Two-View Stereo Matching

Learning Two-View Stereo Matching Learning Two-View Stereo Matching Jianxiong Xiao Jingni Chen Dit-Yan Yeung Long Quan Department of Computer Science and Engineering The Hong Kong University of Science and Technology The 10th European

More information

arxiv: v1 [cs.cv] 20 Dec 2016

arxiv: v1 [cs.cv] 20 Dec 2016 End-to-End Pedestrian Collision Warning System based on a Convolutional Neural Network with Semantic Segmentation arxiv:1612.06558v1 [cs.cv] 20 Dec 2016 Heechul Jung heechul@dgist.ac.kr Min-Kook Choi mkchoi@dgist.ac.kr

More information

Fully Convolutional Network for Depth Estimation and Semantic Segmentation

Fully Convolutional Network for Depth Estimation and Semantic Segmentation Fully Convolutional Network for Depth Estimation and Semantic Segmentation Yokila Arora ICME Stanford University yarora@stanford.edu Ishan Patil Department of Electrical Engineering Stanford University

More information

Motion Tracking and Event Understanding in Video Sequences

Motion Tracking and Event Understanding in Video Sequences Motion Tracking and Event Understanding in Video Sequences Isaac Cohen Elaine Kang, Jinman Kang Institute for Robotics and Intelligent Systems University of Southern California Los Angeles, CA Objectives!

More information

Dynamic visual understanding of the local environment for an indoor navigating robot

Dynamic visual understanding of the local environment for an indoor navigating robot Dynamic visual understanding of the local environment for an indoor navigating robot Grace Tsai and Benjamin Kuipers Abstract We present a method for an embodied agent with vision sensor to create a concise

More information

arxiv: v1 [cs.cv] 16 Nov 2015

arxiv: v1 [cs.cv] 16 Nov 2015 Coarse-to-fine Face Alignment with Multi-Scale Local Patch Regression Zhiao Huang hza@megvii.com Erjin Zhou zej@megvii.com Zhimin Cao czm@megvii.com arxiv:1511.04901v1 [cs.cv] 16 Nov 2015 Abstract Facial

More information

Self Lane Assignment Using Smart Mobile Camera For Intelligent GPS Navigation and Traffic Interpretation

Self Lane Assignment Using Smart Mobile Camera For Intelligent GPS Navigation and Traffic Interpretation For Intelligent GPS Navigation and Traffic Interpretation Tianshi Gao Stanford University tianshig@stanford.edu 1. Introduction Imagine that you are driving on the highway at 70 mph and trying to figure

More information

Visual localization using global visual features and vanishing points

Visual localization using global visual features and vanishing points Visual localization using global visual features and vanishing points Olivier Saurer, Friedrich Fraundorfer, and Marc Pollefeys Computer Vision and Geometry Group, ETH Zürich, Switzerland {saurero,fraundorfer,marc.pollefeys}@inf.ethz.ch

More information

Supplementary Material: Unsupervised Domain Adaptation for Face Recognition in Unlabeled Videos

Supplementary Material: Unsupervised Domain Adaptation for Face Recognition in Unlabeled Videos Supplementary Material: Unsupervised Domain Adaptation for Face Recognition in Unlabeled Videos Kihyuk Sohn 1 Sifei Liu 2 Guangyu Zhong 3 Xiang Yu 1 Ming-Hsuan Yang 2 Manmohan Chandraker 1,4 1 NEC Labs

More information

DEPT: Depth Estimation by Parameter Transfer for Single Still Images

DEPT: Depth Estimation by Parameter Transfer for Single Still Images DEPT: Depth Estimation by Parameter Transfer for Single Still Images Xiu Li 1, 2, Hongwei Qin 1,2, Yangang Wang 3, Yongbing Zhang 1,2, and Qionghai Dai 1 1. Dept. of Automation, Tsinghua University 2.

More information

Direct Multi-Scale Dual-Stream Network for Pedestrian Detection Sang-Il Jung and Ki-Sang Hong Image Information Processing Lab.

Direct Multi-Scale Dual-Stream Network for Pedestrian Detection Sang-Il Jung and Ki-Sang Hong Image Information Processing Lab. [ICIP 2017] Direct Multi-Scale Dual-Stream Network for Pedestrian Detection Sang-Il Jung and Ki-Sang Hong Image Information Processing Lab., POSTECH Pedestrian Detection Goal To draw bounding boxes that

More information

PanoContext: A Whole-room 3D Context Model for Panoramic Scene Understanding

PanoContext: A Whole-room 3D Context Model for Panoramic Scene Understanding PanoContext: A Whole-room D Context Model for Panoramic Scene Understanding Yinda Zhang Shuran Song Ping Tan Jianxiong Xiao Princeton University Simon Fraser University http://panocontext.cs.princeton.edu

More information

Joint Vanishing Point Extraction and Tracking (Supplementary Material)

Joint Vanishing Point Extraction and Tracking (Supplementary Material) Joint Vanishing Point Extraction and Tracking (Supplementary Material) Till Kroeger1 1 Dengxin Dai1 Luc Van Gool1,2 Computer Vision Laboratory, D-ITET, ETH Zurich 2 VISICS, ESAT/PSI, KU Leuven {kroegert,

More information

Deep Supervision with Shape Concepts for Occlusion-Aware 3D Object Parsing Supplementary Material

Deep Supervision with Shape Concepts for Occlusion-Aware 3D Object Parsing Supplementary Material Deep Supervision with Shape Concepts for Occlusion-Aware 3D Object Parsing Supplementary Material Chi Li, M. Zeeshan Zia 2, Quoc-Huy Tran 2, Xiang Yu 2, Gregory D. Hager, and Manmohan Chandraker 2 Johns

More information

Fully Convolutional Networks for Semantic Segmentation

Fully Convolutional Networks for Semantic Segmentation Fully Convolutional Networks for Semantic Segmentation Jonathan Long* Evan Shelhamer* Trevor Darrell UC Berkeley Chaim Ginzburg for Deep Learning seminar 1 Semantic Segmentation Define a pixel-wise labeling

More information

arxiv: v2 [cs.cv] 14 May 2018

arxiv: v2 [cs.cv] 14 May 2018 ContextVP: Fully Context-Aware Video Prediction Wonmin Byeon 1234, Qin Wang 1, Rupesh Kumar Srivastava 3, and Petros Koumoutsakos 1 arxiv:1710.08518v2 [cs.cv] 14 May 2018 Abstract Video prediction models

More information

Viewpoint Invariant Features from Single Images Using 3D Geometry

Viewpoint Invariant Features from Single Images Using 3D Geometry Viewpoint Invariant Features from Single Images Using 3D Geometry Yanpeng Cao and John McDonald Department of Computer Science National University of Ireland, Maynooth, Ireland {y.cao,johnmcd}@cs.nuim.ie

More information

CAP 6412 Advanced Computer Vision

CAP 6412 Advanced Computer Vision CAP 6412 Advanced Computer Vision http://www.cs.ucf.edu/~bgong/cap6412.html Boqing Gong April 21st, 2016 Today Administrivia Free parameters in an approach, model, or algorithm? Egocentric videos by Aisha

More information

OCCLUSION BOUNDARIES ESTIMATION FROM A HIGH-RESOLUTION SAR IMAGE

OCCLUSION BOUNDARIES ESTIMATION FROM A HIGH-RESOLUTION SAR IMAGE OCCLUSION BOUNDARIES ESTIMATION FROM A HIGH-RESOLUTION SAR IMAGE Wenju He, Marc Jäger, and Olaf Hellwich Berlin University of Technology FR3-1, Franklinstr. 28, 10587 Berlin, Germany {wenjuhe, jaeger,

More information

REGION AVERAGE POOLING FOR CONTEXT-AWARE OBJECT DETECTION

REGION AVERAGE POOLING FOR CONTEXT-AWARE OBJECT DETECTION REGION AVERAGE POOLING FOR CONTEXT-AWARE OBJECT DETECTION Kingsley Kuan 1, Gaurav Manek 1, Jie Lin 1, Yuan Fang 1, Vijay Chandrasekhar 1,2 Institute for Infocomm Research, A*STAR, Singapore 1 Nanyang Technological

More information

Deep Tracking: Biologically Inspired Tracking with Deep Convolutional Networks

Deep Tracking: Biologically Inspired Tracking with Deep Convolutional Networks Deep Tracking: Biologically Inspired Tracking with Deep Convolutional Networks Si Chen The George Washington University sichen@gwmail.gwu.edu Meera Hahn Emory University mhahn7@emory.edu Mentor: Afshin

More information

Semantic Segmentation

Semantic Segmentation Semantic Segmentation UCLA:https://goo.gl/images/I0VTi2 OUTLINE Semantic Segmentation Why? Paper to talk about: Fully Convolutional Networks for Semantic Segmentation. J. Long, E. Shelhamer, and T. Darrell,

More information

arxiv: v1 [cs.cv] 28 Sep 2018

arxiv: v1 [cs.cv] 28 Sep 2018 Camera Pose Estimation from Sequence of Calibrated Images arxiv:1809.11066v1 [cs.cv] 28 Sep 2018 Jacek Komorowski 1 and Przemyslaw Rokita 2 1 Maria Curie-Sklodowska University, Institute of Computer Science,

More information

Supplementary Materials for Learning to Parse Wireframes in Images of Man-Made Environments

Supplementary Materials for Learning to Parse Wireframes in Images of Man-Made Environments Supplementary Materials for Learning to Parse Wireframes in Images of Man-Made Environments Kun Huang, Yifan Wang, Zihan Zhou 2, Tianjiao Ding, Shenghua Gao, and Yi Ma 3 ShanghaiTech University {huangkun,

More information

Segmentation and Tracking of Partial Planar Templates

Segmentation and Tracking of Partial Planar Templates Segmentation and Tracking of Partial Planar Templates Abdelsalam Masoud William Hoff Colorado School of Mines Colorado School of Mines Golden, CO 800 Golden, CO 800 amasoud@mines.edu whoff@mines.edu Abstract

More information

Multi-view stereo. Many slides adapted from S. Seitz

Multi-view stereo. Many slides adapted from S. Seitz Multi-view stereo Many slides adapted from S. Seitz Beyond two-view stereo The third eye can be used for verification Multiple-baseline stereo Pick a reference image, and slide the corresponding window

More information

Semantic Segmentation. Zhongang Qi

Semantic Segmentation. Zhongang Qi Semantic Segmentation Zhongang Qi qiz@oregonstate.edu Semantic Segmentation "Two men riding on a bike in front of a building on the road. And there is a car." Idea: recognizing, understanding what's in

More information

Supplementary: Cross-modal Deep Variational Hand Pose Estimation

Supplementary: Cross-modal Deep Variational Hand Pose Estimation Supplementary: Cross-modal Deep Variational Hand Pose Estimation Adrian Spurr, Jie Song, Seonwook Park, Otmar Hilliges ETH Zurich {spurra,jsong,spark,otmarh}@inf.ethz.ch Encoder/Decoder Linear(512) Table

More information

arxiv: v1 [cs.cv] 25 Oct 2017

arxiv: v1 [cs.cv] 25 Oct 2017 ZOU, LI, HOIEM: COMPLETE 3D SCENE PARSING FROM SINGLE RGBD IMAGE 1 arxiv:1710.09490v1 [cs.cv] 25 Oct 2017 Complete 3D Scene Parsing from Single RGBD Image Chuhang Zou http://web.engr.illinois.edu/~czou4/

More information

Structured Prediction using Convolutional Neural Networks

Structured Prediction using Convolutional Neural Networks Overview Structured Prediction using Convolutional Neural Networks Bohyung Han bhhan@postech.ac.kr Computer Vision Lab. Convolutional Neural Networks (CNNs) Structured predictions for low level computer

More information

Deep learning for dense per-pixel prediction. Chunhua Shen The University of Adelaide, Australia

Deep learning for dense per-pixel prediction. Chunhua Shen The University of Adelaide, Australia Deep learning for dense per-pixel prediction Chunhua Shen The University of Adelaide, Australia Image understanding Classification error Convolution Neural Networks 0.3 0.2 0.1 Image Classification [Krizhevsky

More information

SURGE: Surface Regularized Geometry Estimation from a Single Image

SURGE: Surface Regularized Geometry Estimation from a Single Image SURGE: Surface Regularized Geometry Estimation from a Single Image Peng Wang 1 Xiaohui Shen 2 Bryan Russell 2 Scott Cohen 2 Brian Price 2 Alan Yuille 3 1 University of California, Los Angeles 2 Adobe Research

More information