3D model search and pose estimation from single images using VIP features
|
|
- Maximillian Stone
- 6 years ago
- Views:
Transcription
1 3D model search and pose estimation from single images using VIP features Changchang Wu 2, Friedrich Fraundorfer 1, 1 Department of Computer Science ETH Zurich, Switzerland {fraundorfer, marc.pollefeys}@inf.ethz.ch Jan-Michael Frahm 2, Marc Pollefeys 1,2 2 Department of Computer Science UNC Chapel Hill, USA {ccwu,jmf}@cs.unc.edu Abstract This paper describes a method to efficiently search for 3D models in a city-scale database and to compute the camera poses from single query images. The proposed method matches SIFT features (from a single image) to viewpoint invariant patches (VIP) from a 3D model by warping the SIFT features approximately into the orthographic frame of the VIP features. This significantly increases the number of feature correspondences which results in a reliable and robust pose estimation. We also present a 3D model search tool that uses a visual word based search scheme to efficiently retrieve 3D models from large databases using individual query images. Together the 3D model search and the pose estimation represent a highly scalable and efficient city-scale localization system. The performance of the 3D model search and pose estimation is demonstrated on urban image data. query image 3D model from 1.3 M images matching part of the 3D model 1. Introduction Searching for 3D models is a key feature in city-wide localization and pose estimation from mobile devices. From a single snapshot image the corresponding 3D model needs to be found and 3D-2D matches between the model and the image need to be established to estimate the users pose (see illustration in Fig. 1). Main challenges so far are the correspondence problem (3D-2D) and the scalability of the approach. In this paper we will contribute to both of this topics. The first contribution will be a 3D-2D matching method that is based on viewpoint invariant patches (VIP) and can deal with severe viewpoint changes. The second contribution will be the use of a visual word based recognition scheme for efficient and scalable database retrieval. Our database consists of small individual 3D models that represent parts of a large scale reconstruction. Each 3D model is textured and is represented by a collection of VIP features in the database. When querying with an input image, the input s image SIFT features are matched with the database s VIP features to determine the corresponding 3D model. Fi- Figure 1. Mobile vision based localization: A single image from a mobile device is used to search for the corresponding 3D model in a city-scale database and determine thus the user s location. SIFT features extracted from the query image will be matched to VIP features from the 3D models in the database. nally, 3D-2D matches between the 3D model and the input image are established for pose estimation. Viewpoint-invariant-patches (VIP) have been used for registering 3D models in [9] so far. The main idea is to create ortho-textures for the 3D models and detect local features, e.g. SIFT, on them. For this, planes in the 3D model are detected and a virtual camera is set fronto-parallel to each plane. Features are now extracted from the virtual camera image from which the perspective transformation of the initial viewpoint change is removed. In this paper we extend this method to create matches between a 3D model and a single image (3D-2D). In the original method features from both models are represented in the canonical (orthographic) form. In our case only the /08/$ IEEE
2 features from the 3D model are represented in the canonical form while the features from the single image are perspectively transformed. However, while matching will not work for features under large perspective transformation, features which are almost fronto-parallel will match very well with the canonical representation. Under the assumption that the camera of the query image and the 3D plane of the matching features are parallel we can generate hypotheses for the camera pose of the query image. And using these hypotheses we can warp parts of the query image so that they match the perspective transform of the canonical features of the 3D model. This allows us to generate many more additional matches for robust and reliable pose estimation. For exhaustive search in large databases this method would be to slow, therefore we use the method described by Nister and Stewenius [5] for an efficient model search. The model search works with quantized SIFT (and VIP) descriptor vectors, so called visual words. The paper is structured in the following way. The following section describes relevant related work. Section 3 describes the first contribution of this paper, pose estimation using VIP and SIFT features. Section 4 describes how to search for 3D models in large databases efficiently. Section 5 shows experiments on urban image data and finally section 6 draws some conclusions. 2. Related work Many texture based feature detectors and descriptors have been developed for robust wide-baseline matching. One of the most popular is Lowe s SIFT detector [3]. The SIFT detector defines a feature s scale in scale space and a feature orientation from the gradient histogram in the image plane. Using the orientation, the SIFT detector generates normalized image patches to achieve 2D similarity transformation invariance. Many feature detectors, including affine covariant features, use the SIFT descriptor to represent patches. SIFT-descriptors are also used to encode VIP features. However, the VIP approach will work with other feature descriptors, too. Mikolajczyk et al. give a comparison of several local features in [4]. The recently proposed VIP features [9] go beyond affine invariance to robustness to projective transformations. The authors investigated the use of VIP features to align 3D models, but they did not investigate the case of matching VIP to features from single images. Most vision based location systems so far have been demonstrated on small databases [6, 8, 11]. Recently Schindler et al. [7] presented a scheme for city-scale environments. The method uses the visual word based recognition scheme following the approach in [5, 2, 2]. However, Schindler et al. only focused on location recognition. The pose of the user is not computed. Our proposed method will combine both, scalable location recognition and pose estimation. Pose estimation only is the focus of the work in [10]. The authors propose a method to accurately compute the camera pose from 3D-2D matches. High accuracy is achieved by extending the set of initial matches with region growing. Their method could be used as a last step in our localization approach to refine the computed pose. 3. Pose from SIFT-VIP matches Figure 2. VIP s detected on a 3D model Viewpoint-Invariant Patch (VIP) detection VIP s are features that can be extracted from textured 3D models which combine images with corresponding depth maps. VIPs are invariant to 3D similarity transformations. They can be used to robustly and efficiently align 3D models of the same scene from videos taken from significantly different viewpoints. In this paper we ll mostly consider 3D models obtained from video by SfM, but the method is equally applicable to textured 3D models obtained using LIDAR or other sensors. The robustness to 3D similarities exactly corresponds to the ambiguity of 3D models obtained from images, while the ambiguities of other sensors can often be described by a 3D Euclidean transformation or with even fewer degrees of freedom. The undistortion is based on local scene planes or on local planar approximations of the scene. Conceptually, for every point on the surface the local tangent plane s normal is estimated and a texture patch is generated by orthogonal projection onto the plane. Within the local ortho-texture patch it is determined if the point corresponds to a local extremal response of the Difference-of-Gaussians (DoG) Filter in scale space. If it is the orientation is determined in the tangent plane by the dominant gradient direction and a SIFT descriptor on the tangent plane is extracted. Using the tangent plane avoids the poor repeatability of interest point detection under projective transformations seen in popular feature detectors [4].
3 parallel to the local surface normal passing through the 3D point. This step makes the VIP invariant to the intrinsics and extrinsic of the original camera generating an ortho-texture patch. (a) 2. Verify VIP, and find its orientation and size. Keep a 3D point as a VIP feature only when its corresponding pixel in the ortho-texture patch is a stable 2D image feature. Like [3] a DoG Filter and local extrema suppression is used. VIP orientation is found based on the dominant gradient direction in the ortho-texture patch. With the virtual camera, the size and orientation of a VIP can be obtained by transforming the scale and orientation of its corresponding image feature to world coordinates. A VIP is then fully defined as (x, σ, n, d, s) where x is its 3D position, σ is the patch size n is the surface normal at this location, d is texture s dominant orientation as a vector in 3D s is the SIFT descriptor that describes the viewpointnormalized patch. Note, a sift feature is a sift descriptor plus it s position, scale and orientation. Fig. 2 shows VIP features detected on a 3D model. (b) (c) Figure 3. (a) Initial SIFT-VIP matches. Most matches are as expected on the fronto-parallel plane (left image is query image). (b) Camera pose estimated from SIFT-VIP match (re). (c) Resulting set of matches established with the proposed method. The initial set of 17 matches could be extended to 92 correct matches. The method established many matches on the other plane, too. Viewpoint-normalized image patches need to be generated to describe VIPs. Viewpoint-normalization is similar to the normalization of image patches according to scale and orientation performed in SIFT and normalization according to ellipsoid in affine covariant feature detectors. The viewpoint normalization can be divided into the following steps: 1. Warp the image texture for each 3D point, conceptually, using an orthographic camera with optical axis 3.2. Matching VIP with SIFT To match SIFT features from a single image with VIP features from a 3D model, the SIFT features extracted from the image need to be fronto-parallel (or close to) to the VIP features in the model. This might hold only for a fraction of features whose plane is accidentally parallel to the camera viewpoint. For all other features we will warp the corresponding image areas, so that they approximately match the canonical form of the VIP features. The projective warp can be computed along the following steps: 1. Compute the approximate camera position of the query image in the local coordinate frame from at least one fronto-parallel SIFT-VIP match. 2. Determine image areas that need to be warped by projecting the VIP features of the model into the query image. 3. Compute the warp homography for each image area from the 3D plane of the VIP and the estimated camera pose. The whole idea is based on the assumption that inital matches between VIP and SIFT features are fronto-parallel (see Fig. 3(a) for example matches). This assumption allows to compute an estimate for the camera pose of the
4 query image. The VIP feature is located on a plane in 3D and is defined by the feature s center point X (in 3D) and the normal vector n of the plane. Our assumption is that the image plane of the SIFT feature is parallel to the plane and that the principal ray of the camera center is in the direction of n and connects X and the center of the SIFT feature x. This fixes the camera pose along the normal vector n. The distance d between the camera center and the plane can be computed from the scale ratio of the matched feature with the help of the focal length f. d = f S s The focal length f of the camera can be taken from the EXIF data of the image or from camera calibration. S is the scale of the VIP feature and s is the scale of the matching SIFT feature. The missing rotation around the principal axis r can finally be recovered from the dominant gradient direction of the image patches. Fig. 3(b) shows a camera pose estimated from a SIFT-VIP match. Now with the camera P fully defined this approximation can be used to compute the necessary warps. For each VIP feature in the 3D model we determine the corresponding image region in the query image, by projecting the VIP region (specified by center point and scale) onto the image plane. Next we compute the homography transform H that will warp our image region to the canonical form of the VIP feature with (1) H = R + 1 d T N T (2) where R and T are rotation and translation from P to the virtual camera of the VIP feature and N is the normal vector of the VIP plane in the coordinate system of P. Finally we are looking for stable 2D image feature in the warped image area by applying the SIFT detector. Clearly our assumptions are not met exactly which results in an inaccurate camera pose estimate. SIFT descriptors, which are developed for wide-baseline matching, enable matching within a certain range of viewpoint change and thus the camera plane might not be exactly parallel to the VIP feature plane. However, we do not depend on an exact pose estimate for this step. We account for the uncertainty in the camera pose by enlarging the region to warp. In addition, remaining differences between the VIP and SIFT feature can be compensated with SIFT matching. Fig. 3 shows examples of final SIFT-VIP matches. The initial matching between SIFT and VIP features results in 17 matches. From this a camera pose estimate can be computed which allows to warp the SIFT detections in the input image into approximate fronto-parallel configuration. Matching the rectified SIFT detections with the VIP features yields 92 correct matches. Algorithm 1 3D model search and pose estimation 1: Extract SIFT features from query image 2: Compute visual word document vector for query image 3: Compute L 2 distances to all document vectors in 3D model database (inverted file query) 4: Use 3D model corresponding to the smallest distance as matching 3D model 5: Match SIFT features from query image to VIP features from database 3D model (nearest neighbor matching) 6: Compute camera pose hypotheses from SIFT-VIP matches 7: Warp the query image according to the camera pose hypotheses and extract fronto-parallel SIFT features. 8: Match fronto-parallel SIFT features to VIP features 9: Compute final pose from SIFT-VIP matches 3.3. Pose estimation The 3D-2D matches between VIP and SIFT features can now be used to compute the camera pose accurately and thus determine the location of the user within the map. The main benefit for pose estimation is that we could significantly increase the number of feature matches, which results in a reliable and robust pose estimation. An outline of the complete localization method is given in Algorithm Efficient 3D model search in large databases For pose estimation as described in the previous section the corresponding 3D model needs to be known. For large databases, necessary for city-wide localization, an exhaustive search through all the 3D models is not possible. Thus a first step prior pose estimation is to search for the corresponding 3D model. Our database consists of small individual 3D models that represent parts of a large scale vision based 3D reconstruction, created as described in [1]. Each individual 3D model is represented by a set of VIP features extracted from the model texture. These features are used to create a visual word database as described in [5]. This allows for efficient model search to determine the 3D model necessary for pose estimation. Similar to [5], firstly, VIP features are extracted from the 3D models. Each VIP descriptor is quantized by a hierarchical vocabulary tree. All visual words from one 3D model form a document vector which is a v-dimensional vector where v is the number of possible visual words. It is usually extremely sparse. For a model query the similarity between the query document vector to all document vectors in a database is computed. As similarity score we use the L 2 distance between document vectors. The organization of the database as an inverted file and the sparseness of the document vectors allows a very efficient scoring. For scoring, the different visual words are weighted based on the
5 inverse document frequency (IDF) measure. The database images are ranked by the L 2 distance. The vector with the lowest distance is reported as the most similar match. In a next step initial SIFT-VIP matches are sought to start the pose estimation algorithm. Corresponding features can be efficiently determined by using the quantized visual word description. Features with the same visual word description are reported as matches which only takes O(n) time where n is the number of features. The visual word description is very efficient. The plain visual word database size is DB inv = 4fI, (3) where f is the maximum number of visual words per model and I is the number of models in the database. The factor 4 comes from the use of 4 byte integers to hold the model index where a visual word occurred. If we assume an average of 1000 visual words per model a database containing 1 million models would only need 4GB of RAM. In addition to visual words we also need to store the 2D coordinates, scale and rotation for the SIFT features and additional 3D coordinates, plane parameters and virtual camera for the VIP features, which still allows to store a huge number of models in the database. 5. Experiments 5.1. SIFT-VIP matching results We conducted an experiment to compare standard SIFT- SIFT matching with our proposed SIFT-VIP matching. Fig. 4(a) shows the established SIFT-SIFT matches. Only 10 matches could be detected and many of them are actually mis-matches. When computing the initial SIFT- VIP matches, the number of correspondences increases to 25, most of them are correct (see Fig. 4(b)). The proposed method however is able to detect 91 correct SIFT-VIP matches as shown in Fig. 4(c). This is a significantly higher number of matches which allows a more accurate pose estimation. Note, that the matches are nicely distributed on two scene planes. Fig. 4(d) shows the resulting pose estimate in red color. Fig. 5 shows the camera position hypotheses from single SIFT-VIP matches in green. Each match generates one hypothesis. The red camera is the correct camera pose. All the camera estimates are set fronto-parallel to the VIP feature in the 3D model and therefore the camera estimates generated from the plane not fronto-parallel to the real camera position are off. However, it can be seen that many pose hypotheses are very close to the correct solution. Each of them can be used to extend the initial SIFT-VIP matches to a larger set. Fig. 6 shows an example with 3 scene planes. The 105 (partially incorrect) SIFT-SIFT matches get extended to 223 correct SIFT-VIP matches on all the 3 scene planes. Fig. 6(b) shows examples for orthographic VIP patches. The images show the extracted SIFT patches from the query image, the warped SIFT patches and the VIP patches of the 3d model. (from left to right). Ideally the warped SIFT patches and the VIP patches should be perfectly aligned. However, as the initial SIFT-VIP matches are not exactly fronto-parallel the camera pose is inaccurate and the patches are not perfectly registered. But the difference is not very large, which means that our simple pose estimation works impressively well D model search performance evaluation In this experiment we show the performance of the 3D model search. The video data to create the models in the first place was acquired with car mounted cameras while driving through a city. Two cameras were mounted on the roof of a car, one was pointing straight sidewards the other one was pointing forward in a 45 angle. The fields of view of both cameras do not overlap but as the system is moving over time the captured scene parts will overlap. To retrieve ground truth data for the camera motion the image acquisition was synchronized with a highly accurate GPS-inertia system. Accordingly we know the location of the camera for each video frame. In this experiment a 3D model database represented by VIP features is created from the side camera video. The database will be queried with the video frames from the side camera which are represented by SIFT features. The database contains 113 3D models which will be queried with 2644 images. The query video frames have a resolution of which resulted in up to 5000 features per frame. The vocabulary tree used was trained on general image data from the web. The 3D model search results are visualized by plotting lines between frame-to-3d model matches (see Fig. 7). The identical camera paths of the forward and side camera are shifted by a small amount in x and y direction to make the matching links visible. We only draw matches below a distance threshold of 10m so that mis-matches get filtered out. The red markers are the query camera positions and the green markers are the 3D model positions in the database. In the figure the top-10 ranked matches are drawn. Usually one considers the topn ranked matches as possible hypotheses and verifies the correct one geometrically. In our case this can be done by the pose estimation. Fig. 8 shows some correct example matches D model query with cell phone images We developed an application that allows to query a 3D city-model database (see screenshot in Fig. 9) from arbitrary input images. The database so far contains 851 3D models and the retrieval works in real-time. Fig. 9(b) shows an image query with a cell phone image. The cell phone query
6 (a) (b) (c) Figure 5. Camera pose hypotheses from SIFT-VIP matches (green). The groundtruth camera pose of the query image is shown in red. Multiple hypotheses are very close to the real camera pose. 6. Conclusion (d) Figure 4. Comparison of standard SIFT-SIFT matching and our proposed SIFT-VIP method. (a) SIFT-SIFT matches. Only 10 matches could be found, most of them are mis-matches. (b) Initial SIFT-VIP matches. 25 matches could be found, most of them are correct. (c) Resulting set of matches established with the proposed method. The initial set of 25 matches could be extended to 91 correct matches. (d) The SIFT-VIP matches in 3D showing the estimated camera pose (red). image shows different resolution and was taken month later, nevertheless we could match it perfectly. In this paper we addressed two important topics in visual localization. Firstly, we investigated the case of 3D-2D pose estimation using VIP and SIFT features. We showed that it is possible to match images to 3D models by matching SIFT features to VIP features. We demonstrated, that it is possible to increase the number of initial SIFT-VIP matches significantly by warping the query features into the orthographic frame of the VIP features. This increases the reliability and robustness of pose estimation. Secondly, we demonstrated a 3D model search scheme that is efficiently scalable up to city-scales. Localization experiments with images from camera phones showed that this approach is suitable for city-wide localization from mobile devices. References [1] A. Akbarzadeh, J. Frahm, P. Mordohai, B. Clipp, C. Engels, D. Gallup, P. Merrell, M. Phelps, S. Sinha, B. Talton, L. Wang, Q. Yang, H. Stewenius, R. Yang, G. Welch, H. Towles, D. Nister, and M. Pollefeys. Towards urban 3d reconstruction from video. In 3D Data Processing, Visualization and Transmission, pages 1 8, [2] O. Chum, J. Philbin, J. Sivic, M. Isard, and A. Zisserman. Total recall: Automatic query expansion with a generative
7 y [m] x [m] Figure 7. 3D model search. Red markers are query camera positions and green markers are the 3D model positions in the database. Lines show matches below a 10 m distance threshold. Each match should be seen as a match hypothesis which is to be verified by the geometric constraints of pose estimation. (a) (b) Figure 6. (a) SIFT-VIP matches and estimated camera pose for a scene with 3 planes. (b) Examples for warped SIFT patches and orthographic VIP patches. From left to right: Extraced SIFT patch from query images, warped SIFT patch, VIP patch of 3d model. The VIP patches are impressively well aligned to the warped SIFT patches, despite the inaccuracies of the camera pose. Figure 8. Matches from the 3D model search. Left: Query image from the forward camera. Right: Retrieved 3D models.
8 (a) [7] G. Schindler, M. Brown, and R. Szeliski. City-scale location recognition. In Proc. IEEE Conference on Computer Vision and Pattern Recognition, Minneapolis, Minnesota, pages 1 7, [8] H. Shao, T. Svoboda, T. Tuytelaars,, and L. J. V. Gool. Hpat indexing for fast object/scene recognition based on local appearance. In Conference on Image and video retrieval, pages 71 80, [9] C. Wu, B. Clipp, X. Li, J.-M. Frahm, and M. Pollefeys. 3d model matching with viewpoint invariant patches (vips). In To appear in Proc. IEEE Conference on Computer Vision and Pattern Recognition, [10] G. Yang, J. Becker, and C. Stewart. Estimating the location of a camera with respect to a 3d model. In 3DIM07, pages , [11] W. Zhang and J. Kosecka. Image based localization in urban environments. In 3D Data Processing, Visualization and Transmission, pages 33 40, (b) Figure 9. (a) Screenshots of our 3D model search tool. The query image can be selected from a list on the left. As a result the corresponding 3D model shows up. (b) Query with an image from a camera phone. feature model for object retrieval. In Proc. 11th International Conference on Computer Vision, Rio de Janeiro, Brazil, [3] D. Lowe. Distinctive image features from scale-invariant keypoints. International Journal of Computer Vision, 60(2):91 110, [4] K. Mikolajczyk, T. Tuytelaars, C. Schmid, A. Zisserman, J. Matas, F. Schaffalitzky, T. Kadir, and L. Van Gool. A comparison of affine region detectors. International Journal of Computer Vision, 65(1-2):43 72, [5] D. Nistér and H. Stewénius. Scalable recognition with a vocabulary tree. In Proc. IEEE Conference on Computer Vision and Pattern Recognition, New York City, New York, pages , [6] D. Robertsone and R. Cipolla. An image-based system for urban navigation. In Proc. 14th British Machine Vision Conference, London, UK, pages 1 10, 2004.
Visual Word based Location Recognition in 3D models using Distance Augmented Weighting
Visual Word based Location Recognition in 3D models using Distance Augmented Weighting Friedrich Fraundorfer 1, Changchang Wu 2, 1 Department of Computer Science ETH Zürich, Switzerland {fraundorfer, marc.pollefeys}@inf.ethz.ch
More informationViewpoint Invariant Features from Single Images Using 3D Geometry
Viewpoint Invariant Features from Single Images Using 3D Geometry Yanpeng Cao and John McDonald Department of Computer Science National University of Ireland, Maynooth, Ireland {y.cao,johnmcd}@cs.nuim.ie
More informationFrom Structure-from-Motion Point Clouds to Fast Location Recognition
From Structure-from-Motion Point Clouds to Fast Location Recognition Arnold Irschara1;2, Christopher Zach2, Jan-Michael Frahm2, Horst Bischof1 1Graz University of Technology firschara, bischofg@icg.tugraz.at
More informationGlobal localization from a single feature correspondence
Global localization from a single feature correspondence Friedrich Fraundorfer and Horst Bischof Institute for Computer Graphics and Vision Graz University of Technology {fraunfri,bischof}@icg.tu-graz.ac.at
More informationMidterm Wed. Local features: detection and description. Today. Last time. Local features: main components. Goal: interest operator repeatability
Midterm Wed. Local features: detection and description Monday March 7 Prof. UT Austin Covers material up until 3/1 Solutions to practice eam handed out today Bring a 8.5 11 sheet of notes if you want Review
More informationGeometry based Repetition Detection for Urban Scene
Geometry based Repetition Detection for Urban Scene Changchang Wu University of Washington Jan Michael Frahm UNC Chapel Hill Marc Pollefeys ETH Zürich Related Work Sparse Feature Matching [Loy et al. 06,
More informationInstance-level recognition part 2
Visual Recognition and Machine Learning Summer School Paris 2011 Instance-level recognition part 2 Josef Sivic http://www.di.ens.fr/~josef INRIA, WILLOW, ENS/INRIA/CNRS UMR 8548 Laboratoire d Informatique,
More informationLocal features and image matching. Prof. Xin Yang HUST
Local features and image matching Prof. Xin Yang HUST Last time RANSAC for robust geometric transformation estimation Translation, Affine, Homography Image warping Given a 2D transformation T and a source
More informationLocal features: detection and description May 12 th, 2015
Local features: detection and description May 12 th, 2015 Yong Jae Lee UC Davis Announcements PS1 grades up on SmartSite PS1 stats: Mean: 83.26 Standard Dev: 28.51 PS2 deadline extended to Saturday, 11:59
More informationEvaluation and comparison of interest points/regions
Introduction Evaluation and comparison of interest points/regions Quantitative evaluation of interest point/region detectors points / regions at the same relative location and area Repeatability rate :
More informationLocal features: detection and description. Local invariant features
Local features: detection and description Local invariant features Detection of interest points Harris corner detection Scale invariant blob detection: LoG Description of local patches SIFT : Histograms
More informationScale Invariant Feature Transform
Why do we care about matching features? Scale Invariant Feature Transform Camera calibration Stereo Tracking/SFM Image moiaicing Object/activity Recognition Objection representation and recognition Automatic
More informationSpecular 3D Object Tracking by View Generative Learning
Specular 3D Object Tracking by View Generative Learning Yukiko Shinozuka, Francois de Sorbier and Hideo Saito Keio University 3-14-1 Hiyoshi, Kohoku-ku 223-8522 Yokohama, Japan shinozuka@hvrl.ics.keio.ac.jp
More informationVideo Google: A Text Retrieval Approach to Object Matching in Videos
Video Google: A Text Retrieval Approach to Object Matching in Videos Josef Sivic, Frederik Schaffalitzky, Andrew Zisserman Visual Geometry Group University of Oxford The vision Enable video, e.g. a feature
More informationInstance-level recognition II.
Reconnaissance d objets et vision artificielle 2010 Instance-level recognition II. Josef Sivic http://www.di.ens.fr/~josef INRIA, WILLOW, ENS/INRIA/CNRS UMR 8548 Laboratoire d Informatique, Ecole Normale
More informationA Comparison of SIFT, PCA-SIFT and SURF
A Comparison of SIFT, PCA-SIFT and SURF Luo Juan Computer Graphics Lab, Chonbuk National University, Jeonju 561-756, South Korea qiuhehappy@hotmail.com Oubong Gwun Computer Graphics Lab, Chonbuk National
More informationScale Invariant Feature Transform
Scale Invariant Feature Transform Why do we care about matching features? Camera calibration Stereo Tracking/SFM Image moiaicing Object/activity Recognition Objection representation and recognition Image
More informationA minimal case solution to the calibrated relative pose problem for the case of two known orientation angles
A minimal case solution to the calibrated relative pose problem for the case of two known orientation angles Friedrich Fraundorfer, Petri Tanskanen and Marc Pollefeys Computer Vision and Geometry Lab Department
More informationLocal invariant features
Local invariant features Tuesday, Oct 28 Kristen Grauman UT-Austin Today Some more Pset 2 results Pset 2 returned, pick up solutions Pset 3 is posted, due 11/11 Local invariant features Detection of interest
More informationObject Recognition with Invariant Features
Object Recognition with Invariant Features Definition: Identify objects or scenes and determine their pose and model parameters Applications Industrial automation and inspection Mobile robots, toys, user
More informationBundling Features for Large Scale Partial-Duplicate Web Image Search
Bundling Features for Large Scale Partial-Duplicate Web Image Search Zhong Wu, Qifa Ke, Michael Isard, and Jian Sun Microsoft Research Abstract In state-of-the-art image retrieval systems, an image is
More informationImage-based Modeling and Rendering: 8. Image Transformation and Panorama
Image-based Modeling and Rendering: 8. Image Transformation and Panorama I-Chen Lin, Assistant Professor Dept. of CS, National Chiao Tung Univ, Taiwan Outline Image transformation How to represent the
More informationChapter 3 Image Registration. Chapter 3 Image Registration
Chapter 3 Image Registration Distributed Algorithms for Introduction (1) Definition: Image Registration Input: 2 images of the same scene but taken from different perspectives Goal: Identify transformation
More informationHandling Urban Location Recognition as a 2D Homothetic Problem
Handling Urban Location Recognition as a 2D Homothetic Problem Georges Baatz 1, Kevin Köser 1, David Chen 2, Radek Grzeszczuk 3, and Marc Pollefeys 1 1 Department of Computer Science, ETH Zurich, Switzerland
More informationFeature Based Registration - Image Alignment
Feature Based Registration - Image Alignment Image Registration Image registration is the process of estimating an optimal transformation between two or more images. Many slides from Alexei Efros http://graphics.cs.cmu.edu/courses/15-463/2007_fall/463.html
More informationStereo and Epipolar geometry
Previously Image Primitives (feature points, lines, contours) Today: Stereo and Epipolar geometry How to match primitives between two (multiple) views) Goals: 3D reconstruction, recognition Jana Kosecka
More informationGain Adaptive Real-Time Stereo Streaming
Gain Adaptive Real-Time Stereo Streaming S. J. Kim +, D. Gallup +, J.-M. Frahm +, A. Akbarzadeh, Q. Yang, R. Yang, D. Nistér and M. Pollefeys + + Department of Computer Science Department of Computer Science
More informationSimultaneous Recognition and Homography Extraction of Local Patches with a Simple Linear Classifier
Simultaneous Recognition and Homography Extraction of Local Patches with a Simple Linear Classifier Stefan Hinterstoisser 1, Selim Benhimane 1, Vincent Lepetit 2, Pascal Fua 2, Nassir Navab 1 1 Department
More informationPerformance Evaluation of Scale-Interpolated Hessian-Laplace and Haar Descriptors for Feature Matching
Performance Evaluation of Scale-Interpolated Hessian-Laplace and Haar Descriptors for Feature Matching Akshay Bhatia, Robert Laganière School of Information Technology and Engineering University of Ottawa
More informationStereoscopic Images Generation By Monocular Camera
Stereoscopic Images Generation By Monocular Camera Swapnil Lonare M. tech Student Department of Electronics Engineering (Communication) Abha Gaikwad - Patil College of Engineering. Nagpur, India 440016
More informationLarge scale object/scene recognition
Large scale object/scene recognition Image dataset: > 1 million images query Image search system ranked image list Each image described by approximately 2000 descriptors 2 10 9 descriptors to index! Database
More informationarxiv: v1 [cs.cv] 28 Sep 2018
Camera Pose Estimation from Sequence of Calibrated Images arxiv:1809.11066v1 [cs.cv] 28 Sep 2018 Jacek Komorowski 1 and Przemyslaw Rokita 2 1 Maria Curie-Sklodowska University, Institute of Computer Science,
More informationSEARCH BY MOBILE IMAGE BASED ON VISUAL AND SPATIAL CONSISTENCY. Xianglong Liu, Yihua Lou, Adams Wei Yu, Bo Lang
SEARCH BY MOBILE IMAGE BASED ON VISUAL AND SPATIAL CONSISTENCY Xianglong Liu, Yihua Lou, Adams Wei Yu, Bo Lang State Key Laboratory of Software Development Environment Beihang University, Beijing 100191,
More informationAUTOMATED 3D RECONSTRUCTION OF URBAN AREAS FROM NETWORKS OF WIDE-BASELINE IMAGE SEQUENCES
AUTOMATED 3D RECONSTRUCTION OF URBAN AREAS FROM NETWORKS OF WIDE-BASELINE IMAGE SEQUENCES HelmutMayer, JanBartelsen Institute of Geoinformation and Computer Vision, Bundeswehr University Munich - (Helmut.Mayer,
More informationLarge Scale Image Retrieval
Large Scale Image Retrieval Ondřej Chum and Jiří Matas Center for Machine Perception Czech Technical University in Prague Features Affine invariant features Efficient descriptors Corresponding regions
More informationCS4670: Computer Vision
CS4670: Computer Vision Noah Snavely Lecture 6: Feature matching and alignment Szeliski: Chapter 6.1 Reading Last time: Corners and blobs Scale-space blob detector: Example Feature descriptors We know
More informationConstruction of Precise Local Affine Frames
Construction of Precise Local Affine Frames Andrej Mikulik, Jiri Matas, Michal Perdoch, Ondrej Chum Center for Machine Perception Czech Technical University in Prague Czech Republic e-mail: mikulik@cmp.felk.cvut.cz
More informationLocal Features: Detection, Description & Matching
Local Features: Detection, Description & Matching Lecture 08 Computer Vision Material Citations Dr George Stockman Professor Emeritus, Michigan State University Dr David Lowe Professor, University of British
More informationComputer Vision for HCI. Topics of This Lecture
Computer Vision for HCI Interest Points Topics of This Lecture Local Invariant Features Motivation Requirements, Invariances Keypoint Localization Features from Accelerated Segment Test (FAST) Harris Shi-Tomasi
More informationSIFT: SCALE INVARIANT FEATURE TRANSFORM SURF: SPEEDED UP ROBUST FEATURES BASHAR ALSADIK EOS DEPT. TOPMAP M13 3D GEOINFORMATION FROM IMAGES 2014
SIFT: SCALE INVARIANT FEATURE TRANSFORM SURF: SPEEDED UP ROBUST FEATURES BASHAR ALSADIK EOS DEPT. TOPMAP M13 3D GEOINFORMATION FROM IMAGES 2014 SIFT SIFT: Scale Invariant Feature Transform; transform image
More informationBuilding a Panorama. Matching features. Matching with Features. How do we build a panorama? Computational Photography, 6.882
Matching features Building a Panorama Computational Photography, 6.88 Prof. Bill Freeman April 11, 006 Image and shape descriptors: Harris corner detectors and SIFT features. Suggested readings: Mikolajczyk
More informationLocal Features Tutorial: Nov. 8, 04
Local Features Tutorial: Nov. 8, 04 Local Features Tutorial References: Matlab SIFT tutorial (from course webpage) Lowe, David G. Distinctive Image Features from Scale Invariant Features, International
More informationMotion Estimation and Optical Flow Tracking
Image Matching Image Retrieval Object Recognition Motion Estimation and Optical Flow Tracking Example: Mosiacing (Panorama) M. Brown and D. G. Lowe. Recognising Panoramas. ICCV 2003 Example 3D Reconstruction
More informationA Novel Algorithm for Color Image matching using Wavelet-SIFT
International Journal of Scientific and Research Publications, Volume 5, Issue 1, January 2015 1 A Novel Algorithm for Color Image matching using Wavelet-SIFT Mupuri Prasanth Babu *, P. Ravi Shankar **
More informationInstance-level recognition
Instance-level recognition 1) Local invariant features 2) Matching and recognition with local features 3) Efficient visual search 4) Very large scale indexing Matching of descriptors Matching and 3D reconstruction
More informationLarge Scale 3D Reconstruction by Structure from Motion
Large Scale 3D Reconstruction by Structure from Motion Devin Guillory Ziang Xie CS 331B 7 October 2013 Overview Rome wasn t built in a day Overview of SfM Building Rome in a Day Building Rome on a Cloudless
More informationCamera Drones Lecture 3 3D data generation
Camera Drones Lecture 3 3D data generation Ass.Prof. Friedrich Fraundorfer WS 2017 Outline SfM introduction SfM concept Feature matching Camera pose estimation Bundle adjustment Dense matching Data products
More informationTHE image based localization problem as considered in
JOURNAL OF L A TEX CLASS FILES, VOL. 6, NO. 1, JANUARY 2007 1 Image Based Localization Wei Zhang, Member, IEEE, and Jana Košecká Member, IEEE, Abstract In this paper we present an approach for image based
More informationPatch Descriptors. CSE 455 Linda Shapiro
Patch Descriptors CSE 455 Linda Shapiro How can we find corresponding points? How can we find correspondences? How do we describe an image patch? How do we describe an image patch? Patches with similar
More informationMeasuring camera translation by the dominant apical angle
Measuring camera translation by the dominant apical angle Akihiko Torii 1 Michal Havlena 1 Tomáš Pajdla 1 1 CMP, Czech Technical University Prague, Czech Republic {torii,havlem1,pajdla}@cmp.felk.cvut.cz
More informationInstance-level recognition
Instance-level recognition 1) Local invariant features 2) Matching and recognition with local features 3) Efficient visual search 4) Very large scale indexing Matching of descriptors Matching and 3D reconstruction
More informationInvariant Features from Interest Point Groups
Invariant Features from Interest Point Groups Matthew Brown and David Lowe {mbrown lowe}@cs.ubc.ca Department of Computer Science, University of British Columbia, Vancouver, Canada. Abstract This paper
More informationThe SIFT (Scale Invariant Feature
The SIFT (Scale Invariant Feature Transform) Detector and Descriptor developed by David Lowe University of British Columbia Initial paper ICCV 1999 Newer journal paper IJCV 2004 Review: Matt Brown s Canonical
More informationcalibrated coordinates Linear transformation pixel coordinates
1 calibrated coordinates Linear transformation pixel coordinates 2 Calibration with a rig Uncalibrated epipolar geometry Ambiguities in image formation Stratified reconstruction Autocalibration with partial
More informationPatch Descriptors. EE/CSE 576 Linda Shapiro
Patch Descriptors EE/CSE 576 Linda Shapiro 1 How can we find corresponding points? How can we find correspondences? How do we describe an image patch? How do we describe an image patch? Patches with similar
More informationLecture 14: Indexing with local features. Thursday, Nov 1 Prof. Kristen Grauman. Outline
Lecture 14: Indexing with local features Thursday, Nov 1 Prof. Kristen Grauman Outline Last time: local invariant features, scale invariant detection Applications, including stereo Indexing with invariant
More informationStereo Vision. MAN-522 Computer Vision
Stereo Vision MAN-522 Computer Vision What is the goal of stereo vision? The recovery of the 3D structure of a scene using two or more images of the 3D scene, each acquired from a different viewpoint in
More informationVisual localization using global visual features and vanishing points
Visual localization using global visual features and vanishing points Olivier Saurer, Friedrich Fraundorfer, and Marc Pollefeys Computer Vision and Geometry Group, ETH Zürich, Switzerland {saurero,fraundorfer,marc.pollefeys}@inf.ethz.ch
More informationClosing the Loop in Appearance-Guided Structure-from-Motion for Omnidirectional Cameras
Closing the Loop in Appearance-Guided Structure-from-Motion for Omnidirectional Cameras Davide Scaramuzza 1, Friedrich Fraundorfer 2, Marc Pollefeys 2, and Roland Siegwart 1 1 Autonomous Systems Lab, ETH
More informationSUMMARY: DISTINCTIVE IMAGE FEATURES FROM SCALE- INVARIANT KEYPOINTS
SUMMARY: DISTINCTIVE IMAGE FEATURES FROM SCALE- INVARIANT KEYPOINTS Cognitive Robotics Original: David G. Lowe, 004 Summary: Coen van Leeuwen, s1460919 Abstract: This article presents a method to extract
More informationFeatures Points. Andrea Torsello DAIS Università Ca Foscari via Torino 155, Mestre (VE)
Features Points Andrea Torsello DAIS Università Ca Foscari via Torino 155, 30172 Mestre (VE) Finding Corners Edge detectors perform poorly at corners. Corners provide repeatable points for matching, so
More informationPhoto Tourism: Exploring Photo Collections in 3D
Photo Tourism: Exploring Photo Collections in 3D SIGGRAPH 2006 Noah Snavely Steven M. Seitz University of Washington Richard Szeliski Microsoft Research 2006 2006 Noah Snavely Noah Snavely Reproduced with
More informationRegion matching for omnidirectional images using virtual camera planes
Computer Vision Winter Workshop 2006, Ondřej Chum, Vojtěch Franc (eds.) Telč, Czech Republic, February 6 8 Czech Pattern Recognition Society Region matching for omnidirectional images using virtual camera
More information3D reconstruction how accurate can it be?
Performance Metrics for Correspondence Problems 3D reconstruction how accurate can it be? Pierre Moulon, Foxel CVPR 2015 Workshop Boston, USA (June 11, 2015) We can capture large environments. But for
More informationSIFT - scale-invariant feature transform Konrad Schindler
SIFT - scale-invariant feature transform Konrad Schindler Institute of Geodesy and Photogrammetry Invariant interest points Goal match points between images with very different scale, orientation, projective
More informationFuzzy based Multiple Dictionary Bag of Words for Image Classification
Available online at www.sciencedirect.com Procedia Engineering 38 (2012 ) 2196 2206 International Conference on Modeling Optimisation and Computing Fuzzy based Multiple Dictionary Bag of Words for Image
More informationProf. Feng Liu. Spring /26/2017
Prof. Feng Liu Spring 2017 http://www.cs.pdx.edu/~fliu/courses/cs510/ 04/26/2017 Last Time Re-lighting HDR 2 Today Panorama Overview Feature detection Mid-term project presentation Not real mid-term 6
More informationHierarchical Building Recognition
Hierarchical Building Recognition Anonymous Abstract We propose a novel and efficient method for recognition of buildings and locations with dominant man-made structures. The method exploits two different
More informationFeature Detection and Matching
and Matching CS4243 Computer Vision and Pattern Recognition Leow Wee Kheng Department of Computer Science School of Computing National University of Singapore Leow Wee Kheng (CS4243) Camera Models 1 /
More information3D Model Acquisition by Tracking 2D Wireframes
3D Model Acquisition by Tracking 2D Wireframes M. Brown, T. Drummond and R. Cipolla {96mab twd20 cipolla}@eng.cam.ac.uk Department of Engineering University of Cambridge Cambridge CB2 1PZ, UK Abstract
More informationRegion Graphs for Organizing Image Collections
Region Graphs for Organizing Image Collections Alexander Ladikos 1, Edmond Boyer 2, Nassir Navab 1, and Slobodan Ilic 1 1 Chair for Computer Aided Medical Procedures, Technische Universität München 2 Perception
More informationAugmenting Reality, Naturally:
Augmenting Reality, Naturally: Scene Modelling, Recognition and Tracking with Invariant Image Features by Iryna Gordon in collaboration with David G. Lowe Laboratory for Computational Intelligence Department
More informationLarge-scale visual recognition The bag-of-words representation
Large-scale visual recognition The bag-of-words representation Florent Perronnin, XRCE Hervé Jégou, INRIA CVPR tutorial June 16, 2012 Outline Bag-of-words Large or small vocabularies? Extensions for instance-level
More informationSpeeding up the Detection of Line Drawings Using a Hash Table
Speeding up the Detection of Line Drawings Using a Hash Table Weihan Sun, Koichi Kise 2 Graduate School of Engineering, Osaka Prefecture University, Japan sunweihan@m.cs.osakafu-u.ac.jp, 2 kise@cs.osakafu-u.ac.jp
More informationFeature Detection. Raul Queiroz Feitosa. 3/30/2017 Feature Detection 1
Feature Detection Raul Queiroz Feitosa 3/30/2017 Feature Detection 1 Objetive This chapter discusses the correspondence problem and presents approaches to solve it. 3/30/2017 Feature Detection 2 Outline
More informationDense 3D Reconstruction. Christiano Gava
Dense 3D Reconstruction Christiano Gava christiano.gava@dfki.de Outline Previous lecture: structure and motion II Structure and motion loop Triangulation Today: dense 3D reconstruction The matching problem
More informationLecture 12 Recognition
Institute of Informatics Institute of Neuroinformatics Lecture 12 Recognition Davide Scaramuzza 1 Lab exercise today replaced by Deep Learning Tutorial Room ETH HG E 1.1 from 13:15 to 15:00 Optional lab
More informationUnit 3 Multiple View Geometry
Unit 3 Multiple View Geometry Relations between images of a scene Recovering the cameras Recovering the scene structure http://www.robots.ox.ac.uk/~vgg/hzbook/hzbook1.html 3D structure from images Recover
More informationImage Features: Local Descriptors. Sanja Fidler CSC420: Intro to Image Understanding 1/ 58
Image Features: Local Descriptors Sanja Fidler CSC420: Intro to Image Understanding 1/ 58 [Source: K. Grauman] Sanja Fidler CSC420: Intro to Image Understanding 2/ 58 Local Features Detection: Identify
More informationOutline 7/2/201011/6/
Outline Pattern recognition in computer vision Background on the development of SIFT SIFT algorithm and some of its variations Computational considerations (SURF) Potential improvement Summary 01 2 Pattern
More informationRobust Online Object Learning and Recognition by MSER Tracking
Computer Vision Winter Workshop 28, Janez Perš (ed.) Moravske Toplice, Slovenia, February 4 6 Slovenian Pattern Recognition Society, Ljubljana, Slovenia Robust Online Object Learning and Recognition by
More informationEfficient Representation of Local Geometry for Large Scale Object Retrieval
Efficient Representation of Local Geometry for Large Scale Object Retrieval Michal Perďoch Ondřej Chum and Jiří Matas Center for Machine Perception Czech Technical University in Prague IEEE Computer Society
More informationCS223b Midterm Exam, Computer Vision. Monday February 25th, Winter 2008, Prof. Jana Kosecka
CS223b Midterm Exam, Computer Vision Monday February 25th, Winter 2008, Prof. Jana Kosecka Your name email This exam is 8 pages long including cover page. Make sure your exam is not missing any pages.
More informationIII. VERVIEW OF THE METHODS
An Analytical Study of SIFT and SURF in Image Registration Vivek Kumar Gupta, Kanchan Cecil Department of Electronics & Telecommunication, Jabalpur engineering college, Jabalpur, India comparing the distance
More informationImproving feature based object recognition in service robotics by disparity map based segmentation.
The 2010 IEEE/RSJ International Conference on Intelligent Robots and Systems October 18-22, 2010, Taipei, Taiwan Improving feature based object recognition in service robotics by disparity map based segmentation.
More informationRobust 6DOF Motion Estimation for Non-Overlapping, Multi-Camera Systems
Robust 6DOF Motion Estimation for Non-Overlapping, Multi-Camera Systems Brian Clipp 1, Jae-Hak Kim 2, Jan-Michael Frahm 1, Marc Pollefeys 3 and Richard Hartley 2 1 Department of Computer Science 2 Research
More informationA NEW FEATURE BASED IMAGE REGISTRATION ALGORITHM INTRODUCTION
A NEW FEATURE BASED IMAGE REGISTRATION ALGORITHM Karthik Krish Stuart Heinrich Wesley E. Snyder Halil Cakir Siamak Khorram North Carolina State University Raleigh, 27695 kkrish@ncsu.edu sbheinri@ncsu.edu
More informationURBAN STRUCTURE ESTIMATION USING PARALLEL AND ORTHOGONAL LINES
URBAN STRUCTURE ESTIMATION USING PARALLEL AND ORTHOGONAL LINES An Undergraduate Research Scholars Thesis by RUI LIU Submitted to Honors and Undergraduate Research Texas A&M University in partial fulfillment
More informationVisual Recognition and Search April 18, 2008 Joo Hyun Kim
Visual Recognition and Search April 18, 2008 Joo Hyun Kim Introduction Suppose a stranger in downtown with a tour guide book?? Austin, TX 2 Introduction Look at guide What s this? Found Name of place Where
More informationSalient Visual Features to Help Close the Loop in 6D SLAM
Visual Features to Help Close the Loop in 6D SLAM Lars Kunze, Kai Lingemann, Andreas Nüchter, and Joachim Hertzberg University of Osnabrück, Institute of Computer Science Knowledge Based Systems Research
More informationAn Overview of Matchmoving using Structure from Motion Methods
An Overview of Matchmoving using Structure from Motion Methods Kamyar Haji Allahverdi Pour Department of Computer Engineering Sharif University of Technology Tehran, Iran Email: allahverdi@ce.sharif.edu
More informationCS 4495 Computer Vision A. Bobick. CS 4495 Computer Vision. Features 2 SIFT descriptor. Aaron Bobick School of Interactive Computing
CS 4495 Computer Vision Features 2 SIFT descriptor Aaron Bobick School of Interactive Computing Administrivia PS 3: Out due Oct 6 th. Features recap: Goal is to find corresponding locations in two images.
More informationIndexing local features and instance recognition May 16 th, 2017
Indexing local features and instance recognition May 16 th, 2017 Yong Jae Lee UC Davis Announcements PS2 due next Monday 11:59 am 2 Recap: Features and filters Transforming and describing images; textures,
More informationHamming embedding and weak geometric consistency for large scale image search
Hamming embedding and weak geometric consistency for large scale image search Herve Jegou, Matthijs Douze, and Cordelia Schmid INRIA Grenoble, LEAR, LJK firstname.lastname@inria.fr Abstract. This paper
More informationIndexing local features and instance recognition May 14 th, 2015
Indexing local features and instance recognition May 14 th, 2015 Yong Jae Lee UC Davis Announcements PS2 due Saturday 11:59 am 2 We can approximate the Laplacian with a difference of Gaussians; more efficient
More informationStructure Guided Salient Region Detector
Structure Guided Salient Region Detector Shufei Fan, Frank Ferrie Center for Intelligent Machines McGill University Montréal H3A2A7, Canada Abstract This paper presents a novel method for detection of
More informationRegular Paper Challenges in wide-area structure-from-motion
Regular Paper Challenges in wide-area structure-from-motion Marc Pollefeys (Institute of Visual Computing, ETH Zurich) (Dept. of Computer Science, University of North Carolina at Chapel Hill) Jan-Michael
More informationColorado School of Mines. Computer Vision. Professor William Hoff Dept of Electrical Engineering &Computer Science.
Professor William Hoff Dept of Electrical Engineering &Computer Science http://inside.mines.edu/~whoff/ 1 Object Recognition in Large Databases Some material for these slides comes from www.cs.utexas.edu/~grauman/courses/spring2011/slides/lecture18_index.pptx
More informationStatic Scene Reconstruction
GPU supported Real-Time Scene Reconstruction with a Single Camera Jan-Michael Frahm, 3D Computer Vision group, University of North Carolina at Chapel Hill Static Scene Reconstruction 1 Capture on campus
More informationBUILDING POINT GROUPING USING VIEW-GEOMETRY RELATIONS INTRODUCTION
BUILDING POINT GROUPING USING VIEW-GEOMETRY RELATIONS I-Chieh Lee 1, Shaojun He 1, Po-Lun Lai 2, Alper Yilmaz 2 1 Mapping and GIS Laboratory 2 Photogrammetric Computer Vision Laboratory Dept. of Civil
More information