Deep Learning Based 3D Reconstruction of Indoor Scenes

Size: px
Start display at page:

Download "Deep Learning Based 3D Reconstruction of Indoor Scenes"

Transcription

1 Deep Learning Based 3D Reconstruction of Indoor Scenes Student Name: Srinidhi Hegde Roll Number: BTP report submitted in partial fulfillment of the requirements for the Degree of B.Tech. in Computer Science & Engineering on 18th April 2017 BTP Track: Research Track BTP Advisor Dr. Saket Anand, Assistant Professor (CSE), IIIT-D Dr. Ojaswa Sharma, Assistant Professor (CSE), IIIT-D Indraprastha Institute of Information Technology New Delhi

2 Student s Declaration I hereby declare that the work presented in the report entitled Learning Based 3D Reconstruction submitted by me for the partial fulfillment of the requirements for the degree of Bachelor of Technology in Computer Science & Engineering at Indraprastha Institute of Information Technology, Delhi, is an authentic record of my work carried out under guidance of Dr. Saket Anand and Dr. Ojaswa Sharma. Due acknowledgements have been given in the report to all material used. This work has not been submitted anywhere else for the reward of any other degree.... Place & Date:... Srinidhi Hegde Certificate This is to certify that the above statement made by the candidate is correct to the best of my knowledge.... Place & Date:... Dr. Saket Anand... Place & Date:... Dr. Ojaswa Sharma 2

3 Abstract Recent advancement in deep learning techniques has opened doors for wide variety of applications. With growing interests in deep learning and geometry, lots of computer vision problems have been tackled using deep learning. In this work, we try to create a framework for a learning based 3D reconstruction of interiors of building from multiple 2D images that capture the entire scene of interest. We use PoseNet for regressing over the camera pose to establish spatial relationship between constituents of a scene. This work is a step towards solving a bigger problem of reconstruction from incomplete data of the scene. Keywords: 3D reconstruction, relocalization, deep learning, Convolutional Neuural Network

4 Acknowledgments I would like to express my special thanks of gratitude to my advisors Dr. Saket Anand and Dr. Ojaswa Sharma for their guidance and constant help. I am also grateful for their timely help and untiring effort for providing necessary and useful inputs. I would like to express my immense gratitude to my parents for their constant support and motivation. I would also like to thank IIIT Delhi for providing me with this wonderful opportunity and for access to all the resources and facilities used by me. I have taken efforts in this project. However, it would not have been possible without the kind support and help of many individuals. I would like to extend my sincere thanks to one and all of them. Work Distribution The work distribution is provided in terms of weekly completion of tasks. Week 1 to Week 5: Literature Survey Week 6 to Week 8: Data Generation and Modelling Lab Week 9 to Week 12: Data Visualization Week 13 to Week 14: Posenet Training i

5 Contents 1 Introduction Motivation D-Reconstruction Problem Feature Points Identification and Correspondences Depth Estimation D Model Registration Incorporating Learning with 3D-Reconstruction Learning Approaches for Geometry and Reconstruction Learning Depth, Normals and Semantic Labels using Deep Learning Depth Estimation Surface Normal Estimation Semantic Segmentation PoseNet: A CNN for Real-Time 6-DOF Camera Relocalization Creating Framework Data Collection and Processing Experimental Setup and Procedure Challenges Results Summary and Overview Summary from Previous Part Overview Deep Learning for Elements of 3D Reconstruction SceneNet - Synthetic Dataset Object Pose Estimation Using Siamese Network Pipeline for 3D Reconstruction 15 ii

6 6.1 About the Pipeline Experiments and Results Future Works Model Improvement Removing Constraints - Exploring Incomplete Data iii

7 PART - I (Monsoon Semester, 2016) iv

8 Chapter 1 Introduction 1.1 Motivation 3D reconstruction is an problem that helps in rapid prototyping of geometrical models in 3D and thus is an essential part of scene understanding. Learning when included with 3D reconstruction can help in automating the process which could be tiresome to do manually. Pose estimation is equally important for 3D reconstruction which is helpful in establishing spatial relationship between different geometric entity in 3D. Thus automation of pose estimation from image can reduce manual intervention to a great extent. 3D Reconstruction has wide variety of application in various domains. 3D reconstruction is an integral part for fields such as robotics, augmented reality and virtual reality. Automation in generating models helps in relocalization and path planning in robotics in real-time. Faster reconstruction helps in real-time rendering of geometric models which is helpful for various AR/VR applications. On extending this problem to reconstruction from incomplete data, this could be helpful in solving real-time puzzle assembly problem which is a challenging problem in the domain of artificial intelligence D-Reconstruction Problem The problem of 3D-Reconstruction is one of most challenging problems in the domain of computer vision. As the name suggests, the problem involves recreating 3D models out of incomplete data. The problem is applicable for different forms of data. For example, for 2D images it could be also thought of as an inverse process of obtaining images from 3D real world objects (photography and imaging). As evident from nature of the problem, one of the challenges lies in accurately and effectively extracting the originally lost depth information from the 2D images. 3D reconstruction can also be applied to retrieve complete 3D model from an incomplete 3D point cloud. In this scenario it could be challenging to estimate the exact transformations of different parts of model for registering the parts onto a composite final model. In the problem, discussed here, the focus is on both 3D reconstruction from multiple 2D images and registration among models generated from different images. Some of the key concepts that are essential for solving the 3D reconstruction problem are discussed in the following subsections. 1

9 1.2.1 Feature Points Identification and Correspondences For estimating the pose and transformation between two image pairs or two geometric model pairs, it is essential to identify some feature points which are invariant to affine transformations. This helps in uniquely identifying a point in an image or a 3D model Depth Estimation Depth estimation involves generating depth map from images, which is an image where each pixel intensity represents the depth of visible object from camera. The conventional techniques of depth estimation involve multi-view approach where epipolar geometry is employed to estimate the depth from multiple images D Model Registration Registration of 3D geometric models deals with alignment of the models or point clouds (set of points in 3D space). The two prominent algorithms used for model registrations are Iterative Closest Point(ICP) algorithm and Random Sample Consensus(RANSAC) algorithm. Iterative Closest Point: ICP is one of the widely used methods for 3D model registration. It was first proposed by Besl et.al [2]. The algorithm takes two geometric models (as set of points), say A and B, and the point correspondences amongst them as input. Then it tries to estimate the affine transformations between A and B. The transformation is estimated such that it minimizes the distance amongst the corresponding points in A and B. Random Sample Consensus: RANSAC is a generic iterative framework for fitting models into unstructured data. In each iteration, it takes minimal data items that could define the model which is being fit onto the dataset and it greedily updates the model to maximize the best matching data points (inliers). [5] 1.3 Incorporating Learning with 3D-Reconstruction The concept of learning is formally defined by Tom Mitchell in [9] as following: A computer program is said to learn from experience E with respect to some class of tasks T and performance measure P, if its performance at tasks in T, as measured by P, improves with experience E. It has been seen that some of the advanced learning approaches have been employed recently for geometric understanding of the scene. Generally, learning is a powerful tool that enables us to automate lot of the processes involved in 3D reconstruction. From the visual and geometric cues from the images the algorithm can learn key features such as depth, position and orientation for a better scene understanding. In recent years, deep learning involving deep neural network framework have been used to estimate depth, surface normals and semantic relations from visual cues of a single image as descibed in detail in 2.1. Furthermore it is seen that relocalization is also an important problem which could be learnt (as discussed in 2.2) using deep learning frameworks. 2

10 Chapter 2 Learning Approaches for Geometry and Reconstruction There has been a vast amount of work being done in the field of 3D reconstruction. But incorporation of learning aspects in solving 3D reconstruction problem is growing up to be of recent interest. Hence there are few works pertaining to the problem of learning based 3D reconstruction. Here is a brief overview of the works related to learning based 3D reconstruction. 2.1 Learning Depth, Normals and Semantic Labels using Deep Learning This work by Eigen et al. [4] focuses on solving three important computer vision problems, namely depth estimation, surface normal estimation and semantic segmentation, using a single neural network architecture. This architecture used is a convolutional neural network(cnn) as shown in Figure 2.1. They use a multiscale feature extraction process using three different CNNs for processing features at three different scales to obtain high resolution feature set. Although a single architecture is used, the feature set extraction and training has some finer variations for different problems Depth Estimation For depth estimation, the scene depth is modelled as depth maps. Consider the predicted and ground-truth depth maps to be D and D respectively. The loss function used for training, assuming d = D D, is given by L depth (D, D ) = 1 n i d 2 i 1 2n 2 ( i d i ) [( x d i ) 2 + ( y d i ) 2 ] (2.1) n i where the sums are over valid pixels i and n is the number of pixels. Here, x d i and y d i are the horizontal and vertical image gradients of the difference respectively. On a coarser scale, the network extract depth information based on global geometric features from the image such as vanishing points, object poses and allignment of structures. Whereas on a finer scale, the network extracts local depth information at a finer level such as depth fluctuations due to object texture. 3

11 Figure 2.1: Multi-scale CNN for multiple tasks comprising of 3 layers. Scale 1 - Predicts coarse but spatially varing feature set covering entire image area. Scale 2 - Provides finer predictions at an intermediate level of resolution. Scale 3 - Outputs high resolution output giving highly detailed image. Source: [4] Surface Normal Estimation Surface normals in a 3D geometric model are generally represented as an attribute (in vector form belonging to R 3 ) specific to vertices of the models. Unlike depth estimation, to predict surface normals three channel output is used for x, y and z components of normal vector respectively. A simple elementwise loss function is employed for comparing the predicted normal at each pixel to the ground truth, using a dot product: L normals (N, N ) = 1 N i.ni = 1 n n N.N (2.2) where N and N are predicted and ground-truth vector maps representing surface normal vectors at each pixel. For computing the ground-truth vector map, approach proposed by Silberman et al. [11] is used that is based on fitting least-square planes onto a point cloud generated from image. i Semantic Segmentation The basic approach used for semantic segmentation is based on estimating per-pixel sematic labels on the given image. Then we use these labels to cluster together similar labels representing pixels showing similar objects in scene. The network discussed in the work estimates per-pixel semantic labels using pixelwise softmax classifier. The number of channels in the final output is same as the number of classes. The following pixelwise cross-entropy based loss function is used for training, L semantic (C, C ) = 1 Ci logc i (2.3) n 4 i

12 Figure 2.2: GoogLeNet architecture with components. Source: [1] where C i given by C i = ez i c ez i,c is class prediction at pixel i given the output z of the final convolutional linear layer. The semantic labelling is estimated using the information of depth and surface normals obtained from the same network as discussed in the previous sections. (2.4) 2.2 PoseNet: A CNN for Real-Time 6-DOF Camera Relocalization PoseNet [7] is a real-time system for solving camera relocalization problem. Camera relocalization is a task of identifying the position and orientation, otherwise known as pose, of camera. PoseNet takes as input a 224x224 RGB image and regresses the camera s 6-DoF pose relative to the scene. Pose, which is regressed by Posenet, is modelled as a vector p given as, p = [x, q] (2.5) where x is the position of camera in 3D and q is orientation represented as quaternion. PoseNet uses a modified GoogLeNet [10] architecture as shown in Figure 2.2. The three softmax classifiers are replaced by affine regressors to get pose as output. Another fully connected layer was inserted before the final regressor to form a localized feature vector for exploration and visualization. For implementation purposes we use PoseNet s implementation that uses Caffe library [6]. For training PoseNet on an input image I, stochastic gradient descent is used with the following Euclidean loss function, loss(i) = x x + β q q q 2 (2.6) where β is a scale factor chosen to keep the expected value of position and orientation errors to be approximately equal. For better results, PoseNet is pretrained on large datasets such as ImageNet and Places. This transfer learning reduces the requirement of huge amount of data for pose estimation. 5

13 Chapter 3 Creating Framework In our work, we try to create a framework for a learning based 3D reconstruction of interiors of building from multiple 2D images that capture the entire scene of interest. For learning to reconstruct, it is essential to estimate the poses of camera to get an estimate of pose of scene of interest, with respect to world coordinate frame. So we employ PoseNet for regressing on the camera pose. For this we first focus on creating the required ground truth data that is essential to training for regressing the poses. We then visualize the camera poses to get useful insights into the data. 3.1 Data Collection and Processing Experimental Setup and Procedure For the purpose of generating the ground-truth we captured RGB-D images which along with three RGB color components has an additional depth component. For capturing RGB-D images we used Microsoft Kinect v2 along Kinect Fusion for 3D Reconstruction. We captured the RGB-D images of interiors of Swarath Lab, IIIT-Delhi. The Kinect Fusion SDK helps us to perform 3D-reconstruction of scenes in real-time. This was used to reconstruct the lab in patches and these patches were stitched together using ICP externally. Thus we used two levels of reconstruction for getting the complete model of the lab. Using RANSAC tends to be very expensive to obtain a highly accurate correspondences of models but for lower accuracy this is faster. On the other hand ICP is slow but produces registration with much better result. So in our case, we use a combination of ICP and RANSAC for model registration. First we use RANSAC for alligning the significant planes in the model and then we fine tune the registration procedure using ICP to obtain an accurate model Challenges Due to physical constraints all of the lab could not be captured in one run. So we created multiple patches from different locations in the lab as shown in Figure At each location we rotated the Kinect to cover its entire FoV as visible in KinectFusion tool. After collecting all the data for each portion of the Lab, we stitched all the models using a ICP with intializations based on visual features in the RGB-D image. While obtaining a 3D model using Kinect Fusion, we obtained models with improper registration 6

14 due to availability of less number of features. So we introduced more visual features by placing objects with distinct colors as markers. This helped us to obtain more accurate models of the lab. In the final model obtained by fusing different patches of the lab together, we see some irregularity as shown in Figure 7.1. Here we see that the walls of one portion of the lab are not registered properly. This kind of problem is created by the accumumlation of errors over all the patches is referred to as Loop Closure. (a) (b) (d) (c) (e) Figure 3.1: Patches of models. (a) to (e) represent different lab patches that were stitched to create composite model. In all we created 18 patches Results The point cloud data generated from mapping Swarath Lab interiors are visualized using an opensource tool Meshlab (as shown in the Figure 3.1.3). The camera poses are represented through identical frustums with different positions and orientations. These serve to be the ground-truth for training purposes. We have also collected RGB-D images of the respective camera poses which are shown in Figure RGB-D images are stored as two separate images - RGB image and a greyscale image representing scaled depthmap. 7

15 (a) (b) Figure 3.2: Composite Model. Visualization of composite model of lab created using ICP. Frustums represent the pose of cameras - tip of frustum as position and normal to base as orientation of camera. 8

16 PART - II (Winter Semester, 2017) 9

17 Chapter 4 Summary and Overview 4.1 Summary from Previous Part In the previous part, we introduced the problem of 3D Reconstruction as generation of geometrical models in 3D from 2D images. We discussed various techniques involved in 3D reconstruction using structure from motion techniques (SfM), which involved (i) feature points identification and correspondences, (ii) depth estimation from 2D images and (iii) 3D model registration (based on ICP, RANSAC or a combination of both). Apart from discussing traditional SfM techniques, we also discussed how learning can be incorporated in solving the 3D reconstruction problem and also discussed both advantages and disadvantages of using learning algorithms. Employing learning, we saw some of the important elements of 3D geometry and discussed about inferring them using deep learning techniques. We showed convolutional neural networks (CNN) could be employed in finding important geometrical constructs such as depth, surface normals, scene semantics and camera poses. We described a three-in-one architecture proposed by Eigen et al. [4] to predict depth, surface normals and semantics segmentations using a single CNN. Furthermore we discussed about the PoseNet architecture proposed by Kendall et al. [7] for regressing 6Dof camera pose given an RGB image during test time. Previously we also started with building an end-to-end framework that could aid in generating 3D reconstructed models. We used the dataset collected from Swarath Lab at IIIT-Delhi for generating ground truths using SfM for training. We used the data for fine-tuning the PoseNet architecture for predicting camera poses. The of the same is shown in Figure 4.1. Furthermore we produced some of the results of reconstruction from SfM along with camera pose visualizations. We discussed the challenges and shortcomings with this generated model as well. 4.2 Overview The second part of this thesis focuses on the work done in the second (Winter 17) semester of the B.Tech Project. The first part of the thesis work was mostly focused on literature survey and some elemantary experiments and rest was based on data collection for training. In the second part of the work the majority of focus was on implementing some of the ideas discussed before along with exploring different architectures for completing various tasks of 3D reconstruction. 10

18 Figure 4.1: Errors in estimating camera position and camera orientation on pretrained PoseNet model. The next few chapters are organized as follows: chapter 5 discusses some of the important ideas that help in computing certain geometric elements that come in handy for reconstructing 3D geometric models. Following this, chapter 6 discusses the pipeline we propose and some of the experiments we performed and also the results of these experiments. Finally, chapter 7 presents some of the future directions for interesting problems that can be addressed in future. 11

19 Chapter 5 Deep Learning for Elements of 3D Reconstruction This chapter focuses on some of the components of 3D reconstruction which could be combined to develop an end-to-end pipeline for performing the task of 3D reconstruction. We discuss about the performance of deep learning architectures on synthetically generated datasets. Following this we focus on an efficient object pose estimation technique that uses regression on a different class of deep-learning architectures - Siamese Network. 5.1 SceneNet - Synthetic Dataset SceneNet RGB-D [8] is a large scale dataset of photorealistic RGB-D videos which provide complete ground truth for a wide range of problems related to indoor scenes. The dataset is highly suitable for various problems pertaining to the domain of computer vision such as semantic segmentation, estimating geometrical constructs of different datasets, 3D reconstruction and metric SLAM problems. This dataset contains 57 trajectories covered in 1000 video sequences each consisting of frames per video sequence. The dataset contains RGB-D video sequence along with semantic annotations for some cases. The dataset also provides camera poses per frame (with accurate measurements between opening and closing of camera aperture), class and geometric orientation information of all the CAD models available in the scene. Sample data from the dataset could be seen in the Figure 5.1. We used Scenenet RGB-D dataset for fine tuning the pre-trained model of PoseNet for regressing camera-poses from single RGB images. The synthetic dataset could not provide accurate results for indoor scenes with positional error of 1.8m and huge orientation error of 60 to 70 on an average. One of the possible reasons for this behaviour could be availibility of a very small number of samples for training per trajectory for image sequence, that is, 300 per trajectory. 5.2 Object Pose Estimation Using Siamese Network The paper [3] talks about an interesting idea for regressing camera pose. The main problem that the paper talks about is the regression of object s pose in angle space guided by feature space. To improvise normal regression based on angle space, certain constraint is applied on a pair of samples in a particular feature space. This major constraint is that distance between any two samples should be equal or proportionately equal in ideal case in both angle and feature space. 12

20 Figure 5.1: Other than the shown data properties the data set could easily provide optical flow vectors for scene and class segmentation informations with minimal processing. We take an input of pair of RGB images x 1, x 2 and pass it through the Siamese Regression network. The siamese architecture consists of two (or more) branches of the same CNN that share weights and encode two inputs processed in parallel. For this application the CNN consists of two convolutional layer followed by 2 fully convolutional layer and then finally a fully connected layer to output 6DoF object pose in quaternion representation. The Siamese Regression uses the following loss function l f : l f = K f(x n,1 ) f(x n,2 ) 2 2 y n,1 y n,2 2 2 (5.1) n=1 Here, f(x i ) represents the mapping of image x i in a feature space of interest and y i is the mapping of the image x i in angle space (and also its training label) and the sum is over total of K samples of training data. The speciality of this configuration is that we need siamese network only for training and during test time we only need a single architecture without any branches as expressed in Figure 5.2. Final loss function combines feature (siamese regression) loss l f along with regression loss function defined as follows: l R = K g(f(x n )) y n 2 2 (5.2) n=1 where g() is a regression layer function. For better results, this combination is further combined with L2 regularization. The most crucial part of this learning algorithm is learning a feature space that satisfies the aforementioned constraints along with enhanced discriminative property among input data. This is achieved by a learning technique known as Triplet learning. Triplet training samples contain an anchor, a positive sample and a negative sample. Then we optimize anchor and the positive sample to be closer in feature space than the anchor and the negative one. Another improvisation is done while preprocessing the dataset. We can create well-formed batches with 13

21 Figure 5.2: Siamese architecture essentially regresses relative distance between feature and pose spaces. During testing, we extract a branch of the network, and use it for regression. equal representation of positive, negative samples with respect to an anchor. improve the result when compared to random initialization of dataset. This found to 14

22 Chapter 6 Pipeline for 3D Reconstruction 6.1 About the Pipeline For a complete end-to-end pipeline for 3D reconstruction, we satisfy a dual objective of finding camera pose and depth information for generating the 3D model. This is so because depth maps encode in them details of the local geometrical information of a 3D model. Also for obtaining global information we use camera poses as these convey the information of relative poses and transformation between them in the world coordinate frame which aid in generating a global picture. At the end, for registration of different patches of reconstructed models we use the information of surface normals. The overview of pipeline is shown in Figure 6.1. Firstly for estimating camera poses, we used simple SfM techniques. Secondly, for depth estimation we used a CNN architecture proposed by Eigen et al. [4] to generate depth maps and also surface normal maps from a single RGB image. 6.2 Experiments and Results We generated depth map, surface normals using the same method as in [4]. The input RGB images were collected using Kinect and the dimension of these images were These RGB images were reduced to match the dimension of input layer of VGGNet, that is to The obtained depth and surface normal maps had similar dimensions of The results Figure 6.1: A block diagram representing the working of our proposed pipeline. 15

23 Figure 6.2: (From left to right) Original RGB image. Corresponding (color coded) predicted depth map. Corresponding predicted surface normal map. generated from this method on our dataset of Swarath Lab is shown in Figure 6.2. For estimating the camera poses we used SfM. 16

24 Chapter 7 Future Works 7.1 Model Improvement There is a good scope for improvement of model. The problem of loop enclosure (see Figure 7.1) is one such problem which needs to be addressed for accurate representation of any interior scene. Using better feature matching scheme (that is, to create robust correspondences between model features) and bundle adjustment could be used for solving this problem. Other improvements include completion of mapping by including floor and ceiling models with the walls of the interior rooms. 7.2 Removing Constraints - Exploring Incomplete Data In the present work we use RGB-D information, which describes the entire 3D information, to train the PoseNet. Presently, we assume that the data given to us completely describes the scene of interest. But in future we aim to solve the problem of reconstruction with incomplete 2D data with this framework. This could consist of data such as incomplete set of 2D images or floor plan of a room or interiors of building. Figure 7.1: Effects of Loop Enclosure problem due to accumulated errors from all patches. 17

25 Bibliography [1] Available at googlenet_keras/googlenet_components.png. [2] Besl, P. J., and McKay, H. D. A method for registration of 3-d shapes. IEEE Transactions on Pattern Analysis and Machine Intelligence 14, 2 (Feb 1992), [3] Doumanoglou, A., Balntas, V., Kouskouridas, R., and Kim, T.-K. Siamese regression networks with efficient mid-level feature extraction for 3d object pose estimation. arxiv preprint arxiv: (2016). [4] Eigen, D., and Fergus, R. Predicting depth, surface normals and semantic labels with a common multi-scale convolutional architecture. In Proceedings of the IEEE International Conference on Computer Vision (2015), pp [5] Fischler, M. A., and Bolles, R. C. Random sample consensus: A paradigm for model fitting with applications to image analysis and automated cartography. Commun. ACM 24, 6 (June 1981), [6] Jia, Y., Shelhamer, E., Donahue, J., Karayev, S., Long, J., Girshick, R., Guadarrama, S., and Darrell, T. Caffe: Convolutional architecture for fast feature embedding. In Proceedings of the 22nd ACM international conference on Multimedia (2014), ACM, pp [7] Kendall, A., Grimes, M., and Cipolla, R. Posenet: A convolutional network for real-time 6-dof camera relocalization. In Proceedings of the IEEE International Conference on Computer Vision (2015), pp [8] McCormac, J., Handa, A., Leutenegger, S., and Davison, A. J. Scenenet RGB- D: 5m photorealistic images of synthetic indoor trajectories with ground truth. CoRR abs/ (2016). [9] Mitchell, T. M. Machine Learning, 1 ed. McGraw-Hill, Inc., New York, NY, USA, [10] MODULE, I. Googlenet: Going deeper with convolutions. [11] Silberman, N., Hoiem, D., Kohli, P., and Fergus, R. Indoor segmentation and support inference from rgbd images. In European Conference on Computer Vision (2012), Springer, pp

Real-time Object Detection CS 229 Course Project

Real-time Object Detection CS 229 Course Project Real-time Object Detection CS 229 Course Project Zibo Gong 1, Tianchang He 1, and Ziyi Yang 1 1 Department of Electrical Engineering, Stanford University December 17, 2016 Abstract Objection detection

More information

Depth Estimation from a Single Image Using a Deep Neural Network Milestone Report

Depth Estimation from a Single Image Using a Deep Neural Network Milestone Report Figure 1: The architecture of the convolutional network. Input: a single view image; Output: a depth map. 3 Related Work In [4] they used depth maps of indoor scenes produced by a Microsoft Kinect to successfully

More information

Deep Learning for Virtual Shopping. Dr. Jürgen Sturm Group Leader RGB-D

Deep Learning for Virtual Shopping. Dr. Jürgen Sturm Group Leader RGB-D Deep Learning for Virtual Shopping Dr. Jürgen Sturm Group Leader RGB-D metaio GmbH Augmented Reality with the Metaio SDK: IKEA Catalogue App Metaio: Augmented Reality Metaio SDK for ios, Android and Windows

More information

Fully Convolutional Network for Depth Estimation and Semantic Segmentation

Fully Convolutional Network for Depth Estimation and Semantic Segmentation Fully Convolutional Network for Depth Estimation and Semantic Segmentation Yokila Arora ICME Stanford University yarora@stanford.edu Ishan Patil Department of Electrical Engineering Stanford University

More information

Object Reconstruction

Object Reconstruction B. Scholz Object Reconstruction 1 / 39 MIN-Fakultät Fachbereich Informatik Object Reconstruction Benjamin Scholz Universität Hamburg Fakultät für Mathematik, Informatik und Naturwissenschaften Fachbereich

More information

Learning-based Localization

Learning-based Localization Learning-based Localization Eric Brachmann ECCV 2018 Tutorial on Visual Localization - Feature-based vs. Learned Approaches Torsten Sattler, Eric Brachmann Roadmap Machine Learning Basics [10min] Convolutional

More information

Computing the relations among three views based on artificial neural network

Computing the relations among three views based on artificial neural network Computing the relations among three views based on artificial neural network Ying Kin Yu Kin Hong Wong Siu Hang Or Department of Computer Science and Engineering The Chinese University of Hong Kong E-mail:

More information

CS395T paper review. Indoor Segmentation and Support Inference from RGBD Images. Chao Jia Sep

CS395T paper review. Indoor Segmentation and Support Inference from RGBD Images. Chao Jia Sep CS395T paper review Indoor Segmentation and Support Inference from RGBD Images Chao Jia Sep 28 2012 Introduction What do we want -- Indoor scene parsing Segmentation and labeling Support relationships

More information

TRANSPARENT OBJECT DETECTION USING REGIONS WITH CONVOLUTIONAL NEURAL NETWORK

TRANSPARENT OBJECT DETECTION USING REGIONS WITH CONVOLUTIONAL NEURAL NETWORK TRANSPARENT OBJECT DETECTION USING REGIONS WITH CONVOLUTIONAL NEURAL NETWORK 1 Po-Jen Lai ( 賴柏任 ), 2 Chiou-Shann Fuh ( 傅楸善 ) 1 Dept. of Electrical Engineering, National Taiwan University, Taiwan 2 Dept.

More information

arxiv: v1 [cs.cv] 28 Sep 2018

arxiv: v1 [cs.cv] 28 Sep 2018 Camera Pose Estimation from Sequence of Calibrated Images arxiv:1809.11066v1 [cs.cv] 28 Sep 2018 Jacek Komorowski 1 and Przemyslaw Rokita 2 1 Maria Curie-Sklodowska University, Institute of Computer Science,

More information

Building Reliable 2D Maps from 3D Features

Building Reliable 2D Maps from 3D Features Building Reliable 2D Maps from 3D Features Dipl. Technoinform. Jens Wettach, Prof. Dr. rer. nat. Karsten Berns TU Kaiserslautern; Robotics Research Lab 1, Geb. 48; Gottlieb-Daimler- Str.1; 67663 Kaiserslautern;

More information

Structured Light II. Thanks to Ronen Gvili, Szymon Rusinkiewicz and Maks Ovsjanikov

Structured Light II. Thanks to Ronen Gvili, Szymon Rusinkiewicz and Maks Ovsjanikov Structured Light II Johannes Köhler Johannes.koehler@dfki.de Thanks to Ronen Gvili, Szymon Rusinkiewicz and Maks Ovsjanikov Introduction Previous lecture: Structured Light I Active Scanning Camera/emitter

More information

Deep learning for dense per-pixel prediction. Chunhua Shen The University of Adelaide, Australia

Deep learning for dense per-pixel prediction. Chunhua Shen The University of Adelaide, Australia Deep learning for dense per-pixel prediction Chunhua Shen The University of Adelaide, Australia Image understanding Classification error Convolution Neural Networks 0.3 0.2 0.1 Image Classification [Krizhevsky

More information

CEA LIST s participation to the Scalable Concept Image Annotation task of ImageCLEF 2015

CEA LIST s participation to the Scalable Concept Image Annotation task of ImageCLEF 2015 CEA LIST s participation to the Scalable Concept Image Annotation task of ImageCLEF 2015 Etienne Gadeski, Hervé Le Borgne, and Adrian Popescu CEA, LIST, Laboratory of Vision and Content Engineering, France

More information

Accurate 3D Face and Body Modeling from a Single Fixed Kinect

Accurate 3D Face and Body Modeling from a Single Fixed Kinect Accurate 3D Face and Body Modeling from a Single Fixed Kinect Ruizhe Wang*, Matthias Hernandez*, Jongmoo Choi, Gérard Medioni Computer Vision Lab, IRIS University of Southern California Abstract In this

More information

Pose estimation using a variety of techniques

Pose estimation using a variety of techniques Pose estimation using a variety of techniques Keegan Go Stanford University keegango@stanford.edu Abstract Vision is an integral part robotic systems a component that is needed for robots to interact robustly

More information

Deep Learning for Computer Vision II

Deep Learning for Computer Vision II IIIT Hyderabad Deep Learning for Computer Vision II C. V. Jawahar Paradigm Shift Feature Extraction (SIFT, HoG, ) Part Models / Encoding Classifier Sparrow Feature Learning Classifier Sparrow L 1 L 2 L

More information

Rigid ICP registration with Kinect

Rigid ICP registration with Kinect Rigid ICP registration with Kinect Students: Yoni Choukroun, Elie Semmel Advisor: Yonathan Aflalo 1 Overview.p.3 Development of the project..p.3 Papers p.4 Project algorithm..p.6 Result of the whole body.p.7

More information

Accurate Motion Estimation and High-Precision 3D Reconstruction by Sensor Fusion

Accurate Motion Estimation and High-Precision 3D Reconstruction by Sensor Fusion 007 IEEE International Conference on Robotics and Automation Roma, Italy, 0-4 April 007 FrE5. Accurate Motion Estimation and High-Precision D Reconstruction by Sensor Fusion Yunsu Bok, Youngbae Hwang,

More information

Lecture 19: Depth Cameras. Visual Computing Systems CMU , Fall 2013

Lecture 19: Depth Cameras. Visual Computing Systems CMU , Fall 2013 Lecture 19: Depth Cameras Visual Computing Systems Continuing theme: computational photography Cameras capture light, then extensive processing produces the desired image Today: - Capturing scene depth

More information

arxiv: v1 [cs.cv] 31 Mar 2016

arxiv: v1 [cs.cv] 31 Mar 2016 Object Boundary Guided Semantic Segmentation Qin Huang, Chunyang Xia, Wenchao Zheng, Yuhang Song, Hao Xu and C.-C. Jay Kuo arxiv:1603.09742v1 [cs.cv] 31 Mar 2016 University of Southern California Abstract.

More information

3D Photography: Active Ranging, Structured Light, ICP

3D Photography: Active Ranging, Structured Light, ICP 3D Photography: Active Ranging, Structured Light, ICP Kalin Kolev, Marc Pollefeys Spring 2013 http://cvg.ethz.ch/teaching/2013spring/3dphoto/ Schedule (tentative) Feb 18 Feb 25 Mar 4 Mar 11 Mar 18 Mar

More information

3D Computer Vision. Structured Light II. Prof. Didier Stricker. Kaiserlautern University.

3D Computer Vision. Structured Light II. Prof. Didier Stricker. Kaiserlautern University. 3D Computer Vision Structured Light II Prof. Didier Stricker Kaiserlautern University http://ags.cs.uni-kl.de/ DFKI Deutsches Forschungszentrum für Künstliche Intelligenz http://av.dfki.de 1 Introduction

More information

CS381V Experiment Presentation. Chun-Chen Kuo

CS381V Experiment Presentation. Chun-Chen Kuo CS381V Experiment Presentation Chun-Chen Kuo The Paper Indoor Segmentation and Support Inference from RGBD Images. N. Silberman, D. Hoiem, P. Kohli, and R. Fergus. ECCV 2012. 50 100 150 200 250 300 350

More information

Deep Tracking: Biologically Inspired Tracking with Deep Convolutional Networks

Deep Tracking: Biologically Inspired Tracking with Deep Convolutional Networks Deep Tracking: Biologically Inspired Tracking with Deep Convolutional Networks Si Chen The George Washington University sichen@gwmail.gwu.edu Meera Hahn Emory University mhahn7@emory.edu Mentor: Afshin

More information

An Exploration of Computer Vision Techniques for Bird Species Classification

An Exploration of Computer Vision Techniques for Bird Species Classification An Exploration of Computer Vision Techniques for Bird Species Classification Anne L. Alter, Karen M. Wang December 15, 2017 Abstract Bird classification, a fine-grained categorization task, is a complex

More information

3D Object Recognition and Scene Understanding from RGB-D Videos. Yu Xiang Postdoctoral Researcher University of Washington

3D Object Recognition and Scene Understanding from RGB-D Videos. Yu Xiang Postdoctoral Researcher University of Washington 3D Object Recognition and Scene Understanding from RGB-D Videos Yu Xiang Postdoctoral Researcher University of Washington 1 2 Act in the 3D World Sensing & Understanding Acting Intelligent System 3D World

More information

Ryerson University CP8208. Soft Computing and Machine Intelligence. Naive Road-Detection using CNNS. Authors: Sarah Asiri - Domenic Curro

Ryerson University CP8208. Soft Computing and Machine Intelligence. Naive Road-Detection using CNNS. Authors: Sarah Asiri - Domenic Curro Ryerson University CP8208 Soft Computing and Machine Intelligence Naive Road-Detection using CNNS Authors: Sarah Asiri - Domenic Curro April 24 2016 Contents 1 Abstract 2 2 Introduction 2 3 Motivation

More information

KinectFusion: Real-Time Dense Surface Mapping and Tracking

KinectFusion: Real-Time Dense Surface Mapping and Tracking KinectFusion: Real-Time Dense Surface Mapping and Tracking Gabriele Bleser Thanks to Richard Newcombe for providing the ISMAR slides Overview General: scientific papers (structure, category) KinectFusion:

More information

Efficient SLAM Scheme Based ICP Matching Algorithm Using Image and Laser Scan Information

Efficient SLAM Scheme Based ICP Matching Algorithm Using Image and Laser Scan Information Proceedings of the World Congress on Electrical Engineering and Computer Systems and Science (EECSS 2015) Barcelona, Spain July 13-14, 2015 Paper No. 335 Efficient SLAM Scheme Based ICP Matching Algorithm

More information

Gesture based PTZ camera control

Gesture based PTZ camera control Gesture based PTZ camera control Report submitted in May 2014 to the department of Computer Science and Engineering of National Institute of Technology Rourkela in partial fulfillment of the requirements

More information

Dynamic Routing Between Capsules

Dynamic Routing Between Capsules Report Explainable Machine Learning Dynamic Routing Between Capsules Author: Michael Dorkenwald Supervisor: Dr. Ullrich Köthe 28. Juni 2018 Inhaltsverzeichnis 1 Introduction 2 2 Motivation 2 3 CapusleNet

More information

arxiv: v1 [cs.cv] 28 Sep 2018

arxiv: v1 [cs.cv] 28 Sep 2018 Extrinsic camera calibration method and its performance evaluation Jacek Komorowski 1 and Przemyslaw Rokita 2 arxiv:1809.11073v1 [cs.cv] 28 Sep 2018 1 Maria Curie Sklodowska University Lublin, Poland jacek.komorowski@gmail.com

More information

Predicting Depth, Surface Normals and Semantic Labels with a Common Multi-Scale Convolutional Architecture David Eigen, Rob Fergus

Predicting Depth, Surface Normals and Semantic Labels with a Common Multi-Scale Convolutional Architecture David Eigen, Rob Fergus Predicting Depth, Surface Normals and Semantic Labels with a Common Multi-Scale Convolutional Architecture David Eigen, Rob Fergus Presented by: Rex Ying and Charles Qi Input: A Single RGB Image Estimate

More information

Learning-based Neuroimage Registration

Learning-based Neuroimage Registration Learning-based Neuroimage Registration Leonid Teverovskiy and Yanxi Liu 1 October 2004 CMU-CALD-04-108, CMU-RI-TR-04-59 School of Computer Science Carnegie Mellon University Pittsburgh, PA 15213 Abstract

More information

IDE-3D: Predicting Indoor Depth Utilizing Geometric and Monocular Cues

IDE-3D: Predicting Indoor Depth Utilizing Geometric and Monocular Cues 2016 International Conference on Computational Science and Computational Intelligence IDE-3D: Predicting Indoor Depth Utilizing Geometric and Monocular Cues Taylor Ripke Department of Computer Science

More information

Computer Vision Lecture 16

Computer Vision Lecture 16 Computer Vision Lecture 16 Deep Learning Applications 11.01.2017 Bastian Leibe RWTH Aachen http://www.vision.rwth-aachen.de leibe@vision.rwth-aachen.de Announcements Seminar registration period starts

More information

Real-time Image-based Reconstruction of Pipes Using Omnidirectional Cameras

Real-time Image-based Reconstruction of Pipes Using Omnidirectional Cameras Real-time Image-based Reconstruction of Pipes Using Omnidirectional Cameras Dipl. Inf. Sandro Esquivel Prof. Dr.-Ing. Reinhard Koch Multimedia Information Processing Christian-Albrechts-University of Kiel

More information

Panoramic Image Stitching

Panoramic Image Stitching Mcgill University Panoramic Image Stitching by Kai Wang Pengbo Li A report submitted in fulfillment for the COMP 558 Final project in the Faculty of Computer Science April 2013 Mcgill University Abstract

More information

Real-Time Vision-Based State Estimation and (Dense) Mapping

Real-Time Vision-Based State Estimation and (Dense) Mapping Real-Time Vision-Based State Estimation and (Dense) Mapping Stefan Leutenegger IROS 2016 Workshop on State Estimation and Terrain Perception for All Terrain Mobile Robots The Perception-Action Cycle in

More information

Spatial Localization and Detection. Lecture 8-1

Spatial Localization and Detection. Lecture 8-1 Lecture 8: Spatial Localization and Detection Lecture 8-1 Administrative - Project Proposals were due on Saturday Homework 2 due Friday 2/5 Homework 1 grades out this week Midterm will be in-class on Wednesday

More information

CSE 527: Introduction to Computer Vision

CSE 527: Introduction to Computer Vision CSE 527: Introduction to Computer Vision Week 5 - Class 1: Matching, Stitching, Registration September 26th, 2017 ??? Recap Today Feature Matching Image Alignment Panoramas HW2! Feature Matches Feature

More information

Multi-Glance Attention Models For Image Classification

Multi-Glance Attention Models For Image Classification Multi-Glance Attention Models For Image Classification Chinmay Duvedi Stanford University Stanford, CA cduvedi@stanford.edu Pararth Shah Stanford University Stanford, CA pararth@stanford.edu Abstract We

More information

3D Environment Reconstruction

3D Environment Reconstruction 3D Environment Reconstruction Using Modified Color ICP Algorithm by Fusion of a Camera and a 3D Laser Range Finder The 2009 IEEE/RSJ International Conference on Intelligent Robots and Systems October 11-15,

More information

A Summary of Projective Geometry

A Summary of Projective Geometry A Summary of Projective Geometry Copyright 22 Acuity Technologies Inc. In the last years a unified approach to creating D models from multiple images has been developed by Beardsley[],Hartley[4,5,9],Torr[,6]

More information

Know your data - many types of networks

Know your data - many types of networks Architectures Know your data - many types of networks Fixed length representation Variable length representation Online video sequences, or samples of different sizes Images Specific architectures for

More information

Perceiving the 3D World from Images and Videos. Yu Xiang Postdoctoral Researcher University of Washington

Perceiving the 3D World from Images and Videos. Yu Xiang Postdoctoral Researcher University of Washington Perceiving the 3D World from Images and Videos Yu Xiang Postdoctoral Researcher University of Washington 1 2 Act in the 3D World Sensing & Understanding Acting Intelligent System 3D World 3 Understand

More information

Self-supervised Multi-level Face Model Learning for Monocular Reconstruction at over 250 Hz Supplemental Material

Self-supervised Multi-level Face Model Learning for Monocular Reconstruction at over 250 Hz Supplemental Material Self-supervised Multi-level Face Model Learning for Monocular Reconstruction at over 250 Hz Supplemental Material Ayush Tewari 1,2 Michael Zollhöfer 1,2,3 Pablo Garrido 1,2 Florian Bernard 1,2 Hyeongwoo

More information

Computer Vision Lecture 16

Computer Vision Lecture 16 Announcements Computer Vision Lecture 16 Deep Learning Applications 11.01.2017 Seminar registration period starts on Friday We will offer a lab course in the summer semester Deep Robot Learning Topic:

More information

3D Shape Analysis with Multi-view Convolutional Networks. Evangelos Kalogerakis

3D Shape Analysis with Multi-view Convolutional Networks. Evangelos Kalogerakis 3D Shape Analysis with Multi-view Convolutional Networks Evangelos Kalogerakis 3D model repositories [3D Warehouse - video] 3D geometry acquisition [KinectFusion - video] 3D shapes come in various flavors

More information

Virtualized Reality Using Depth Camera Point Clouds

Virtualized Reality Using Depth Camera Point Clouds Virtualized Reality Using Depth Camera Point Clouds Jordan Cazamias Stanford University jaycaz@stanford.edu Abhilash Sunder Raj Stanford University abhisr@stanford.edu Abstract We explored various ways

More information

Volumetric and Multi-View CNNs for Object Classification on 3D Data Supplementary Material

Volumetric and Multi-View CNNs for Object Classification on 3D Data Supplementary Material Volumetric and Multi-View CNNs for Object Classification on 3D Data Supplementary Material Charles R. Qi Hao Su Matthias Nießner Angela Dai Mengyuan Yan Leonidas J. Guibas Stanford University 1. Details

More information

Learning Semantic Environment Perception for Cognitive Robots

Learning Semantic Environment Perception for Cognitive Robots Learning Semantic Environment Perception for Cognitive Robots Sven Behnke University of Bonn, Germany Computer Science Institute VI Autonomous Intelligent Systems Some of Our Cognitive Robots Equipped

More information

arxiv: v2 [cs.cv] 28 Sep 2016

arxiv: v2 [cs.cv] 28 Sep 2016 SemanticFusion: Dense 3D Semantic Mapping with Convolutional Neural Networks John McCormac, Ankur Handa, Andrew Davison, and Stefan Leutenegger Dyson Robotics Lab, Imperial College London arxiv:1609.05130v2

More information

Mobile Point Fusion. Real-time 3d surface reconstruction out of depth images on a mobile platform

Mobile Point Fusion. Real-time 3d surface reconstruction out of depth images on a mobile platform Mobile Point Fusion Real-time 3d surface reconstruction out of depth images on a mobile platform Aaron Wetzler Presenting: Daniel Ben-Hoda Supervisors: Prof. Ron Kimmel Gal Kamar Yaron Honen Supported

More information

Homographies and RANSAC

Homographies and RANSAC Homographies and RANSAC Computer vision 6.869 Bill Freeman and Antonio Torralba March 30, 2011 Homographies and RANSAC Homographies RANSAC Building panoramas Phototourism 2 Depth-based ambiguity of position

More information

Unsupervised Learning of Spatiotemporally Coherent Metrics

Unsupervised Learning of Spatiotemporally Coherent Metrics Unsupervised Learning of Spatiotemporally Coherent Metrics Ross Goroshin, Joan Bruna, Jonathan Tompson, David Eigen, Yann LeCun arxiv 2015. Presented by Jackie Chu Contributions Insight between slow feature

More information

A Survey of Light Source Detection Methods

A Survey of Light Source Detection Methods A Survey of Light Source Detection Methods Nathan Funk University of Alberta Mini-Project for CMPUT 603 November 30, 2003 Abstract This paper provides an overview of the most prominent techniques for light

More information

3D Line Segment Based Model Generation by RGB-D Camera for Camera Pose Estimation

3D Line Segment Based Model Generation by RGB-D Camera for Camera Pose Estimation 3D Line Segment Based Model Generation by RGB-D Camera for Camera Pose Estimation Yusuke Nakayama, Hideo Saito, Masayoshi Shimizu, and Nobuyasu Yamaguchi Graduate School of Science and Technology, Keio

More information

URBAN STRUCTURE ESTIMATION USING PARALLEL AND ORTHOGONAL LINES

URBAN STRUCTURE ESTIMATION USING PARALLEL AND ORTHOGONAL LINES URBAN STRUCTURE ESTIMATION USING PARALLEL AND ORTHOGONAL LINES An Undergraduate Research Scholars Thesis by RUI LIU Submitted to Honors and Undergraduate Research Texas A&M University in partial fulfillment

More information

DEEP NEURAL NETWORKS FOR OBJECT DETECTION

DEEP NEURAL NETWORKS FOR OBJECT DETECTION DEEP NEURAL NETWORKS FOR OBJECT DETECTION Sergey Nikolenko Steklov Institute of Mathematics at St. Petersburg October 21, 2017, St. Petersburg, Russia Outline Bird s eye overview of deep learning Convolutional

More information

3D model classification using convolutional neural network

3D model classification using convolutional neural network 3D model classification using convolutional neural network JunYoung Gwak Stanford jgwak@cs.stanford.edu Abstract Our goal is to classify 3D models directly using convolutional neural network. Most of existing

More information

Disguised Face Identification (DFI) with Facial KeyPoints using Spatial Fusion Convolutional Network. Nathan Sun CIS601

Disguised Face Identification (DFI) with Facial KeyPoints using Spatial Fusion Convolutional Network. Nathan Sun CIS601 Disguised Face Identification (DFI) with Facial KeyPoints using Spatial Fusion Convolutional Network Nathan Sun CIS601 Introduction Face ID is complicated by alterations to an individual s appearance Beard,

More information

CRF Based Point Cloud Segmentation Jonathan Nation

CRF Based Point Cloud Segmentation Jonathan Nation CRF Based Point Cloud Segmentation Jonathan Nation jsnation@stanford.edu 1. INTRODUCTION The goal of the project is to use the recently proposed fully connected conditional random field (CRF) model to

More information

A FRAMEWORK OF EXTRACTING MULTI-SCALE FEATURES USING MULTIPLE CONVOLUTIONAL NEURAL NETWORKS. Kuan-Chuan Peng and Tsuhan Chen

A FRAMEWORK OF EXTRACTING MULTI-SCALE FEATURES USING MULTIPLE CONVOLUTIONAL NEURAL NETWORKS. Kuan-Chuan Peng and Tsuhan Chen A FRAMEWORK OF EXTRACTING MULTI-SCALE FEATURES USING MULTIPLE CONVOLUTIONAL NEURAL NETWORKS Kuan-Chuan Peng and Tsuhan Chen School of Electrical and Computer Engineering, Cornell University, Ithaca, NY

More information

Recognize Complex Events from Static Images by Fusing Deep Channels Supplementary Materials

Recognize Complex Events from Static Images by Fusing Deep Channels Supplementary Materials Recognize Complex Events from Static Images by Fusing Deep Channels Supplementary Materials Yuanjun Xiong 1 Kai Zhu 1 Dahua Lin 1 Xiaoou Tang 1,2 1 Department of Information Engineering, The Chinese University

More information

Deep Learning in Image Processing

Deep Learning in Image Processing Deep Learning in Image Processing Roland Memisevic University of Montreal & TwentyBN ICISP 2016 Roland Memisevic Deep Learning in Image Processing ICISP 2016 f 2? cathedral high-rise f 1 It s the features,

More information

DeepIM: Deep Iterative Matching for 6D Pose Estimation - Supplementary Material

DeepIM: Deep Iterative Matching for 6D Pose Estimation - Supplementary Material DeepIM: Deep Iterative Matching for 6D Pose Estimation - Supplementary Material Yi Li 1, Gu Wang 1, Xiangyang Ji 1, Yu Xiang 2, and Dieter Fox 2 1 Tsinghua University, BNRist 2 University of Washington

More information

Single-view 3D Reconstruction

Single-view 3D Reconstruction Single-view 3D Reconstruction 10/12/17 Computational Photography Derek Hoiem, University of Illinois Some slides from Alyosha Efros, Steve Seitz Notes about Project 4 (Image-based Lighting) You can work

More information

Reconstruction of complete 3D object model from multi-view range images.

Reconstruction of complete 3D object model from multi-view range images. Header for SPIE use Reconstruction of complete 3D object model from multi-view range images. Yi-Ping Hung *, Chu-Song Chen, Ing-Bor Hsieh, Chiou-Shann Fuh Institute of Information Science, Academia Sinica,

More information

Intensity Augmented ICP for Registration of Laser Scanner Point Clouds

Intensity Augmented ICP for Registration of Laser Scanner Point Clouds Intensity Augmented ICP for Registration of Laser Scanner Point Clouds Bharat Lohani* and Sandeep Sashidharan *Department of Civil Engineering, IIT Kanpur Email: blohani@iitk.ac.in. Abstract While using

More information

Deep Incremental Scene Understanding. Federico Tombari & Christian Rupprecht Technical University of Munich, Germany

Deep Incremental Scene Understanding. Federico Tombari & Christian Rupprecht Technical University of Munich, Germany Deep Incremental Scene Understanding Federico Tombari & Christian Rupprecht Technical University of Munich, Germany C. Couprie et al. "Toward Real-time Indoor Semantic Segmentation Using Depth Information"

More information

CS 231A Computer Vision (Winter 2014) Problem Set 3

CS 231A Computer Vision (Winter 2014) Problem Set 3 CS 231A Computer Vision (Winter 2014) Problem Set 3 Due: Feb. 18 th, 2015 (11:59pm) 1 Single Object Recognition Via SIFT (45 points) In his 2004 SIFT paper, David Lowe demonstrates impressive object recognition

More information

Object detection using Region Proposals (RCNN) Ernest Cheung COMP Presentation

Object detection using Region Proposals (RCNN) Ernest Cheung COMP Presentation Object detection using Region Proposals (RCNN) Ernest Cheung COMP790-125 Presentation 1 2 Problem to solve Object detection Input: Image Output: Bounding box of the object 3 Object detection using CNN

More information

S7348: Deep Learning in Ford's Autonomous Vehicles. Bryan Goodman Argo AI 9 May 2017

S7348: Deep Learning in Ford's Autonomous Vehicles. Bryan Goodman Argo AI 9 May 2017 S7348: Deep Learning in Ford's Autonomous Vehicles Bryan Goodman Argo AI 9 May 2017 1 Ford s 12 Year History in Autonomous Driving Today: examples from Stereo image processing Object detection Using RNN

More information

From Orientation to Functional Modeling for Terrestrial and UAV Images

From Orientation to Functional Modeling for Terrestrial and UAV Images From Orientation to Functional Modeling for Terrestrial and UAV Images Helmut Mayer 1 Andreas Kuhn 1, Mario Michelini 1, William Nguatem 1, Martin Drauschke 2 and Heiko Hirschmüller 2 1 Visual Computing,

More information

Segmentation and Tracking of Partial Planar Templates

Segmentation and Tracking of Partial Planar Templates Segmentation and Tracking of Partial Planar Templates Abdelsalam Masoud William Hoff Colorado School of Mines Colorado School of Mines Golden, CO 800 Golden, CO 800 amasoud@mines.edu whoff@mines.edu Abstract

More information

Learning from 3D Data

Learning from 3D Data Learning from 3D Data Thomas Funkhouser Princeton University* * On sabbatical at Stanford and Google Disclaimer: I am talking about the work of these people Shuran Song Andy Zeng Fisher Yu Yinda Zhang

More information

Optimizing Monocular Cues for Depth Estimation from Indoor Images

Optimizing Monocular Cues for Depth Estimation from Indoor Images Optimizing Monocular Cues for Depth Estimation from Indoor Images Aditya Venkatraman 1, Sheetal Mahadik 2 1, 2 Department of Electronics and Telecommunication, ST Francis Institute of Technology, Mumbai,

More information

Comparison of Fine-tuning and Extension Strategies for Deep Convolutional Neural Networks

Comparison of Fine-tuning and Extension Strategies for Deep Convolutional Neural Networks Comparison of Fine-tuning and Extension Strategies for Deep Convolutional Neural Networks Nikiforos Pittaras 1, Foteini Markatopoulou 1,2, Vasileios Mezaris 1, and Ioannis Patras 2 1 Information Technologies

More information

CSE 252B: Computer Vision II

CSE 252B: Computer Vision II CSE 252B: Computer Vision II Lecturer: Serge Belongie Scribes: Jeremy Pollock and Neil Alldrin LECTURE 14 Robust Feature Matching 14.1. Introduction Last lecture we learned how to find interest points

More information

Improving Latent Fingerprint Matching Performance by Orientation Field Estimation using Localized Dictionaries

Improving Latent Fingerprint Matching Performance by Orientation Field Estimation using Localized Dictionaries Available Online at www.ijcsmc.com International Journal of Computer Science and Mobile Computing A Monthly Journal of Computer Science and Information Technology IJCSMC, Vol. 3, Issue. 11, November 2014,

More information

Deep Learning For Video Classification. Presented by Natalie Carlebach & Gil Sharon

Deep Learning For Video Classification. Presented by Natalie Carlebach & Gil Sharon Deep Learning For Video Classification Presented by Natalie Carlebach & Gil Sharon Overview Of Presentation Motivation Challenges of video classification Common datasets 4 different methods presented in

More information

Chapter 9 Object Tracking an Overview

Chapter 9 Object Tracking an Overview Chapter 9 Object Tracking an Overview The output of the background subtraction algorithm, described in the previous chapter, is a classification (segmentation) of pixels into foreground pixels (those belonging

More information

Deep Learning with Tensorflow AlexNet

Deep Learning with Tensorflow   AlexNet Machine Learning and Computer Vision Group Deep Learning with Tensorflow http://cvml.ist.ac.at/courses/dlwt_w17/ AlexNet Krizhevsky, Alex, Ilya Sutskever, and Geoffrey E. Hinton, "Imagenet classification

More information

Today s lecture. Image Alignment and Stitching. Readings. Motion models

Today s lecture. Image Alignment and Stitching. Readings. Motion models Today s lecture Image Alignment and Stitching Computer Vision CSE576, Spring 2005 Richard Szeliski Image alignment and stitching motion models cylindrical and spherical warping point-based alignment global

More information

The Hilbert Problems of Computer Vision. Jitendra Malik UC Berkeley & Google, Inc.

The Hilbert Problems of Computer Vision. Jitendra Malik UC Berkeley & Google, Inc. The Hilbert Problems of Computer Vision Jitendra Malik UC Berkeley & Google, Inc. This talk The computational power of the human brain Research is the art of the soluble Hilbert problems, circa 2004 Hilbert

More information

Deep Learning. Visualizing and Understanding Convolutional Networks. Christopher Funk. Pennsylvania State University.

Deep Learning. Visualizing and Understanding Convolutional Networks. Christopher Funk. Pennsylvania State University. Visualizing and Understanding Convolutional Networks Christopher Pennsylvania State University February 23, 2015 Some Slide Information taken from Pierre Sermanet (Google) presentation on and Computer

More information

Structured light 3D reconstruction

Structured light 3D reconstruction Structured light 3D reconstruction Reconstruction pipeline and industrial applications rodola@dsi.unive.it 11/05/2010 3D Reconstruction 3D reconstruction is the process of capturing the shape and appearance

More information

Finding Tiny Faces Supplementary Materials

Finding Tiny Faces Supplementary Materials Finding Tiny Faces Supplementary Materials Peiyun Hu, Deva Ramanan Robotics Institute Carnegie Mellon University {peiyunh,deva}@cs.cmu.edu 1. Error analysis Quantitative analysis We plot the distribution

More information

Using Machine Learning for Classification of Cancer Cells

Using Machine Learning for Classification of Cancer Cells Using Machine Learning for Classification of Cancer Cells Camille Biscarrat University of California, Berkeley I Introduction Cell screening is a commonly used technique in the development of new drugs.

More information

3D Photography: Stereo

3D Photography: Stereo 3D Photography: Stereo Marc Pollefeys, Torsten Sattler Spring 2016 http://www.cvg.ethz.ch/teaching/3dvision/ 3D Modeling with Depth Sensors Today s class Obtaining depth maps / range images unstructured

More information

Plane Based Free Stationing for Building Models

Plane Based Free Stationing for Building Models Christian MANTHE, Germany Key words: plane based building model, plane detection, plane based transformation SUMMARY 3D Building models are used to construct, manage and rebuild buildings. Thus, associated

More information

CS 4758: Automated Semantic Mapping of Environment

CS 4758: Automated Semantic Mapping of Environment CS 4758: Automated Semantic Mapping of Environment Dongsu Lee, ECE, M.Eng., dl624@cornell.edu Aperahama Parangi, CS, 2013, alp75@cornell.edu Abstract The purpose of this project is to program an Erratic

More information

Contexts and 3D Scenes

Contexts and 3D Scenes Contexts and 3D Scenes Computer Vision Jia-Bin Huang, Virginia Tech Many slides from D. Hoiem Administrative stuffs Final project presentation Nov 30 th 3:30 PM 4:45 PM Grading Three senior graders (30%)

More information

Motion Tracking and Event Understanding in Video Sequences

Motion Tracking and Event Understanding in Video Sequences Motion Tracking and Event Understanding in Video Sequences Isaac Cohen Elaine Kang, Jinman Kang Institute for Robotics and Intelligent Systems University of Southern California Los Angeles, CA Objectives!

More information

Image processing and features

Image processing and features Image processing and features Gabriele Bleser gabriele.bleser@dfki.de Thanks to Harald Wuest, Folker Wientapper and Marc Pollefeys Introduction Previous lectures: geometry Pose estimation Epipolar geometry

More information

Object Localization, Segmentation, Classification, and Pose Estimation in 3D Images using Deep Learning

Object Localization, Segmentation, Classification, and Pose Estimation in 3D Images using Deep Learning Allan Zelener Dissertation Proposal December 12 th 2016 Object Localization, Segmentation, Classification, and Pose Estimation in 3D Images using Deep Learning Overview 1. Introduction to 3D Object Identification

More information

Using Augmented Measurements to Improve the Convergence of ICP. Jacopo Serafin and Giorgio Grisetti

Using Augmented Measurements to Improve the Convergence of ICP. Jacopo Serafin and Giorgio Grisetti Jacopo Serafin and Giorgio Grisetti Point Cloud Registration We want to find the rotation and the translation that maximize the overlap between two point clouds Page 2 Point Cloud Registration We want

More information

Intrinsic3D: High-Quality 3D Reconstruction by Joint Appearance and Geometry Optimization with Spatially-Varying Lighting

Intrinsic3D: High-Quality 3D Reconstruction by Joint Appearance and Geometry Optimization with Spatially-Varying Lighting Intrinsic3D: High-Quality 3D Reconstruction by Joint Appearance and Geometry Optimization with Spatially-Varying Lighting R. Maier 1,2, K. Kim 1, D. Cremers 2, J. Kautz 1, M. Nießner 2,3 Fusion Ours 1

More information