3D model search and pose estimation from single images using VIP features

Size: px
Start display at page:

Download "3D model search and pose estimation from single images using VIP features"

Transcription

1 3D model search and pose estimation from single images using VIP features Changchang Wu 2, Friedrich Fraundorfer 1, 1 Department of Computer Science ETH Zurich, Switzerland {fraundorfer, marc.pollefeys}@inf.ethz.ch Jan-Michael Frahm 2, Marc Pollefeys 1,2 2 Department of Computer Science UNC Chapel Hill, USA {ccwu,jmf}@cs.unc.edu Abstract This paper describes a method to efficiently search for 3D models in a city-scale database and to compute the camera poses from single query images. The proposed method matches SIFT features (from a single image) to viewpoint invariant patches (VIP) from a 3D model by warping the SIFT features approximately into the orthographic frame of the VIP features. This significantly increases the number of feature correspondences which results in a reliable and robust pose estimation. We also present a 3D model search tool that uses a visual word based search scheme to efficiently retrieve 3D models from large databases using individual query images. Together the 3D model search and the pose estimation represent a highly scalable and efficient city-scale localization system. The performance of the 3D model search and pose estimation is demonstrated on urban image data. query image 3D model from 1.3 M images matching part of the 3D model 1. Introduction Searching for 3D models is a key feature in city-wide localization and pose estimation from mobile devices. From a single snapshot image the corresponding 3D model needs to be found and 3D-2D matches between the model and the image need to be established to estimate the users pose (see illustration in Fig. 1). Main challenges so far are the correspondence problem (3D-2D) and the scalability of the approach. In this paper we will contribute to both of this topics. The first contribution will be a 3D-2D matching method that is based on viewpoint invariant patches (VIP) and can deal with severe viewpoint changes. The second contribution will be the use of a visual word based recognition scheme for efficient and scalable database retrieval. Our database consists of small individual 3D models that represent parts of a large scale reconstruction. Each 3D model is textured and is represented by a collection of VIP features in the database. When querying with an input image, the input s image SIFT features are matched with the database s VIP features to determine the corresponding 3D model. Fi- Figure 1. Mobile vision based localization: A single image from a mobile device is used to search for the corresponding 3D model in a city-scale database and determine thus the user s location. SIFT features extracted from the query image will be matched to VIP features from the 3D models in the database. nally, 3D-2D matches between the 3D model and the input image are established for pose estimation. Viewpoint-invariant-patches (VIP) have been used for registering 3D models in [9] so far. The main idea is to create ortho-textures for the 3D models and detect local features, e.g. SIFT, on them. For this, planes in the 3D model are detected and a virtual camera is set fronto-parallel to each plane. Features are now extracted from the virtual camera image from which the perspective transformation of the initial viewpoint change is removed. In this paper we extend this method to create matches between a 3D model and a single image (3D-2D). In the original method features from both models are represented in the canonical (orthographic) form. In our case only the /08/$ IEEE

2 features from the 3D model are represented in the canonical form while the features from the single image are perspectively transformed. However, while matching will not work for features under large perspective transformation, features which are almost fronto-parallel will match very well with the canonical representation. Under the assumption that the camera of the query image and the 3D plane of the matching features are parallel we can generate hypotheses for the camera pose of the query image. And using these hypotheses we can warp parts of the query image so that they match the perspective transform of the canonical features of the 3D model. This allows us to generate many more additional matches for robust and reliable pose estimation. For exhaustive search in large databases this method would be to slow, therefore we use the method described by Nister and Stewenius [5] for an efficient model search. The model search works with quantized SIFT (and VIP) descriptor vectors, so called visual words. The paper is structured in the following way. The following section describes relevant related work. Section 3 describes the first contribution of this paper, pose estimation using VIP and SIFT features. Section 4 describes how to search for 3D models in large databases efficiently. Section 5 shows experiments on urban image data and finally section 6 draws some conclusions. 2. Related work Many texture based feature detectors and descriptors have been developed for robust wide-baseline matching. One of the most popular is Lowe s SIFT detector [3]. The SIFT detector defines a feature s scale in scale space and a feature orientation from the gradient histogram in the image plane. Using the orientation, the SIFT detector generates normalized image patches to achieve 2D similarity transformation invariance. Many feature detectors, including affine covariant features, use the SIFT descriptor to represent patches. SIFT-descriptors are also used to encode VIP features. However, the VIP approach will work with other feature descriptors, too. Mikolajczyk et al. give a comparison of several local features in [4]. The recently proposed VIP features [9] go beyond affine invariance to robustness to projective transformations. The authors investigated the use of VIP features to align 3D models, but they did not investigate the case of matching VIP to features from single images. Most vision based location systems so far have been demonstrated on small databases [6, 8, 11]. Recently Schindler et al. [7] presented a scheme for city-scale environments. The method uses the visual word based recognition scheme following the approach in [5, 2, 2]. However, Schindler et al. only focused on location recognition. The pose of the user is not computed. Our proposed method will combine both, scalable location recognition and pose estimation. Pose estimation only is the focus of the work in [10]. The authors propose a method to accurately compute the camera pose from 3D-2D matches. High accuracy is achieved by extending the set of initial matches with region growing. Their method could be used as a last step in our localization approach to refine the computed pose. 3. Pose from SIFT-VIP matches Figure 2. VIP s detected on a 3D model Viewpoint-Invariant Patch (VIP) detection VIP s are features that can be extracted from textured 3D models which combine images with corresponding depth maps. VIPs are invariant to 3D similarity transformations. They can be used to robustly and efficiently align 3D models of the same scene from videos taken from significantly different viewpoints. In this paper we ll mostly consider 3D models obtained from video by SfM, but the method is equally applicable to textured 3D models obtained using LIDAR or other sensors. The robustness to 3D similarities exactly corresponds to the ambiguity of 3D models obtained from images, while the ambiguities of other sensors can often be described by a 3D Euclidean transformation or with even fewer degrees of freedom. The undistortion is based on local scene planes or on local planar approximations of the scene. Conceptually, for every point on the surface the local tangent plane s normal is estimated and a texture patch is generated by orthogonal projection onto the plane. Within the local ortho-texture patch it is determined if the point corresponds to a local extremal response of the Difference-of-Gaussians (DoG) Filter in scale space. If it is the orientation is determined in the tangent plane by the dominant gradient direction and a SIFT descriptor on the tangent plane is extracted. Using the tangent plane avoids the poor repeatability of interest point detection under projective transformations seen in popular feature detectors [4].

3 parallel to the local surface normal passing through the 3D point. This step makes the VIP invariant to the intrinsics and extrinsic of the original camera generating an ortho-texture patch. (a) 2. Verify VIP, and find its orientation and size. Keep a 3D point as a VIP feature only when its corresponding pixel in the ortho-texture patch is a stable 2D image feature. Like [3] a DoG Filter and local extrema suppression is used. VIP orientation is found based on the dominant gradient direction in the ortho-texture patch. With the virtual camera, the size and orientation of a VIP can be obtained by transforming the scale and orientation of its corresponding image feature to world coordinates. A VIP is then fully defined as (x, σ, n, d, s) where x is its 3D position, σ is the patch size n is the surface normal at this location, d is texture s dominant orientation as a vector in 3D s is the SIFT descriptor that describes the viewpointnormalized patch. Note, a sift feature is a sift descriptor plus it s position, scale and orientation. Fig. 2 shows VIP features detected on a 3D model. (b) (c) Figure 3. (a) Initial SIFT-VIP matches. Most matches are as expected on the fronto-parallel plane (left image is query image). (b) Camera pose estimated from SIFT-VIP match (re). (c) Resulting set of matches established with the proposed method. The initial set of 17 matches could be extended to 92 correct matches. The method established many matches on the other plane, too. Viewpoint-normalized image patches need to be generated to describe VIPs. Viewpoint-normalization is similar to the normalization of image patches according to scale and orientation performed in SIFT and normalization according to ellipsoid in affine covariant feature detectors. The viewpoint normalization can be divided into the following steps: 1. Warp the image texture for each 3D point, conceptually, using an orthographic camera with optical axis 3.2. Matching VIP with SIFT To match SIFT features from a single image with VIP features from a 3D model, the SIFT features extracted from the image need to be fronto-parallel (or close to) to the VIP features in the model. This might hold only for a fraction of features whose plane is accidentally parallel to the camera viewpoint. For all other features we will warp the corresponding image areas, so that they approximately match the canonical form of the VIP features. The projective warp can be computed along the following steps: 1. Compute the approximate camera position of the query image in the local coordinate frame from at least one fronto-parallel SIFT-VIP match. 2. Determine image areas that need to be warped by projecting the VIP features of the model into the query image. 3. Compute the warp homography for each image area from the 3D plane of the VIP and the estimated camera pose. The whole idea is based on the assumption that inital matches between VIP and SIFT features are fronto-parallel (see Fig. 3(a) for example matches). This assumption allows to compute an estimate for the camera pose of the

4 query image. The VIP feature is located on a plane in 3D and is defined by the feature s center point X (in 3D) and the normal vector n of the plane. Our assumption is that the image plane of the SIFT feature is parallel to the plane and that the principal ray of the camera center is in the direction of n and connects X and the center of the SIFT feature x. This fixes the camera pose along the normal vector n. The distance d between the camera center and the plane can be computed from the scale ratio of the matched feature with the help of the focal length f. d = f S s The focal length f of the camera can be taken from the EXIF data of the image or from camera calibration. S is the scale of the VIP feature and s is the scale of the matching SIFT feature. The missing rotation around the principal axis r can finally be recovered from the dominant gradient direction of the image patches. Fig. 3(b) shows a camera pose estimated from a SIFT-VIP match. Now with the camera P fully defined this approximation can be used to compute the necessary warps. For each VIP feature in the 3D model we determine the corresponding image region in the query image, by projecting the VIP region (specified by center point and scale) onto the image plane. Next we compute the homography transform H that will warp our image region to the canonical form of the VIP feature with (1) H = R + 1 d T N T (2) where R and T are rotation and translation from P to the virtual camera of the VIP feature and N is the normal vector of the VIP plane in the coordinate system of P. Finally we are looking for stable 2D image feature in the warped image area by applying the SIFT detector. Clearly our assumptions are not met exactly which results in an inaccurate camera pose estimate. SIFT descriptors, which are developed for wide-baseline matching, enable matching within a certain range of viewpoint change and thus the camera plane might not be exactly parallel to the VIP feature plane. However, we do not depend on an exact pose estimate for this step. We account for the uncertainty in the camera pose by enlarging the region to warp. In addition, remaining differences between the VIP and SIFT feature can be compensated with SIFT matching. Fig. 3 shows examples of final SIFT-VIP matches. The initial matching between SIFT and VIP features results in 17 matches. From this a camera pose estimate can be computed which allows to warp the SIFT detections in the input image into approximate fronto-parallel configuration. Matching the rectified SIFT detections with the VIP features yields 92 correct matches. Algorithm 1 3D model search and pose estimation 1: Extract SIFT features from query image 2: Compute visual word document vector for query image 3: Compute L 2 distances to all document vectors in 3D model database (inverted file query) 4: Use 3D model corresponding to the smallest distance as matching 3D model 5: Match SIFT features from query image to VIP features from database 3D model (nearest neighbor matching) 6: Compute camera pose hypotheses from SIFT-VIP matches 7: Warp the query image according to the camera pose hypotheses and extract fronto-parallel SIFT features. 8: Match fronto-parallel SIFT features to VIP features 9: Compute final pose from SIFT-VIP matches 3.3. Pose estimation The 3D-2D matches between VIP and SIFT features can now be used to compute the camera pose accurately and thus determine the location of the user within the map. The main benefit for pose estimation is that we could significantly increase the number of feature matches, which results in a reliable and robust pose estimation. An outline of the complete localization method is given in Algorithm Efficient 3D model search in large databases For pose estimation as described in the previous section the corresponding 3D model needs to be known. For large databases, necessary for city-wide localization, an exhaustive search through all the 3D models is not possible. Thus a first step prior pose estimation is to search for the corresponding 3D model. Our database consists of small individual 3D models that represent parts of a large scale vision based 3D reconstruction, created as described in [1]. Each individual 3D model is represented by a set of VIP features extracted from the model texture. These features are used to create a visual word database as described in [5]. This allows for efficient model search to determine the 3D model necessary for pose estimation. Similar to [5], firstly, VIP features are extracted from the 3D models. Each VIP descriptor is quantized by a hierarchical vocabulary tree. All visual words from one 3D model form a document vector which is a v-dimensional vector where v is the number of possible visual words. It is usually extremely sparse. For a model query the similarity between the query document vector to all document vectors in a database is computed. As similarity score we use the L 2 distance between document vectors. The organization of the database as an inverted file and the sparseness of the document vectors allows a very efficient scoring. For scoring, the different visual words are weighted based on the

5 inverse document frequency (IDF) measure. The database images are ranked by the L 2 distance. The vector with the lowest distance is reported as the most similar match. In a next step initial SIFT-VIP matches are sought to start the pose estimation algorithm. Corresponding features can be efficiently determined by using the quantized visual word description. Features with the same visual word description are reported as matches which only takes O(n) time where n is the number of features. The visual word description is very efficient. The plain visual word database size is DB inv = 4fI, (3) where f is the maximum number of visual words per model and I is the number of models in the database. The factor 4 comes from the use of 4 byte integers to hold the model index where a visual word occurred. If we assume an average of 1000 visual words per model a database containing 1 million models would only need 4GB of RAM. In addition to visual words we also need to store the 2D coordinates, scale and rotation for the SIFT features and additional 3D coordinates, plane parameters and virtual camera for the VIP features, which still allows to store a huge number of models in the database. 5. Experiments 5.1. SIFT-VIP matching results We conducted an experiment to compare standard SIFT- SIFT matching with our proposed SIFT-VIP matching. Fig. 4(a) shows the established SIFT-SIFT matches. Only 10 matches could be detected and many of them are actually mis-matches. When computing the initial SIFT- VIP matches, the number of correspondences increases to 25, most of them are correct (see Fig. 4(b)). The proposed method however is able to detect 91 correct SIFT-VIP matches as shown in Fig. 4(c). This is a significantly higher number of matches which allows a more accurate pose estimation. Note, that the matches are nicely distributed on two scene planes. Fig. 4(d) shows the resulting pose estimate in red color. Fig. 5 shows the camera position hypotheses from single SIFT-VIP matches in green. Each match generates one hypothesis. The red camera is the correct camera pose. All the camera estimates are set fronto-parallel to the VIP feature in the 3D model and therefore the camera estimates generated from the plane not fronto-parallel to the real camera position are off. However, it can be seen that many pose hypotheses are very close to the correct solution. Each of them can be used to extend the initial SIFT-VIP matches to a larger set. Fig. 6 shows an example with 3 scene planes. The 105 (partially incorrect) SIFT-SIFT matches get extended to 223 correct SIFT-VIP matches on all the 3 scene planes. Fig. 6(b) shows examples for orthographic VIP patches. The images show the extracted SIFT patches from the query image, the warped SIFT patches and the VIP patches of the 3d model. (from left to right). Ideally the warped SIFT patches and the VIP patches should be perfectly aligned. However, as the initial SIFT-VIP matches are not exactly fronto-parallel the camera pose is inaccurate and the patches are not perfectly registered. But the difference is not very large, which means that our simple pose estimation works impressively well D model search performance evaluation In this experiment we show the performance of the 3D model search. The video data to create the models in the first place was acquired with car mounted cameras while driving through a city. Two cameras were mounted on the roof of a car, one was pointing straight sidewards the other one was pointing forward in a 45 angle. The fields of view of both cameras do not overlap but as the system is moving over time the captured scene parts will overlap. To retrieve ground truth data for the camera motion the image acquisition was synchronized with a highly accurate GPS-inertia system. Accordingly we know the location of the camera for each video frame. In this experiment a 3D model database represented by VIP features is created from the side camera video. The database will be queried with the video frames from the side camera which are represented by SIFT features. The database contains 113 3D models which will be queried with 2644 images. The query video frames have a resolution of which resulted in up to 5000 features per frame. The vocabulary tree used was trained on general image data from the web. The 3D model search results are visualized by plotting lines between frame-to-3d model matches (see Fig. 7). The identical camera paths of the forward and side camera are shifted by a small amount in x and y direction to make the matching links visible. We only draw matches below a distance threshold of 10m so that mis-matches get filtered out. The red markers are the query camera positions and the green markers are the 3D model positions in the database. In the figure the top-10 ranked matches are drawn. Usually one considers the topn ranked matches as possible hypotheses and verifies the correct one geometrically. In our case this can be done by the pose estimation. Fig. 8 shows some correct example matches D model query with cell phone images We developed an application that allows to query a 3D city-model database (see screenshot in Fig. 9) from arbitrary input images. The database so far contains 851 3D models and the retrieval works in real-time. Fig. 9(b) shows an image query with a cell phone image. The cell phone query

6 (a) (b) (c) Figure 5. Camera pose hypotheses from SIFT-VIP matches (green). The groundtruth camera pose of the query image is shown in red. Multiple hypotheses are very close to the real camera pose. 6. Conclusion (d) Figure 4. Comparison of standard SIFT-SIFT matching and our proposed SIFT-VIP method. (a) SIFT-SIFT matches. Only 10 matches could be found, most of them are mis-matches. (b) Initial SIFT-VIP matches. 25 matches could be found, most of them are correct. (c) Resulting set of matches established with the proposed method. The initial set of 25 matches could be extended to 91 correct matches. (d) The SIFT-VIP matches in 3D showing the estimated camera pose (red). image shows different resolution and was taken month later, nevertheless we could match it perfectly. In this paper we addressed two important topics in visual localization. Firstly, we investigated the case of 3D-2D pose estimation using VIP and SIFT features. We showed that it is possible to match images to 3D models by matching SIFT features to VIP features. We demonstrated, that it is possible to increase the number of initial SIFT-VIP matches significantly by warping the query features into the orthographic frame of the VIP features. This increases the reliability and robustness of pose estimation. Secondly, we demonstrated a 3D model search scheme that is efficiently scalable up to city-scales. Localization experiments with images from camera phones showed that this approach is suitable for city-wide localization from mobile devices. References [1] A. Akbarzadeh, J. Frahm, P. Mordohai, B. Clipp, C. Engels, D. Gallup, P. Merrell, M. Phelps, S. Sinha, B. Talton, L. Wang, Q. Yang, H. Stewenius, R. Yang, G. Welch, H. Towles, D. Nister, and M. Pollefeys. Towards urban 3d reconstruction from video. In 3D Data Processing, Visualization and Transmission, pages 1 8, [2] O. Chum, J. Philbin, J. Sivic, M. Isard, and A. Zisserman. Total recall: Automatic query expansion with a generative

7 y [m] x [m] Figure 7. 3D model search. Red markers are query camera positions and green markers are the 3D model positions in the database. Lines show matches below a 10 m distance threshold. Each match should be seen as a match hypothesis which is to be verified by the geometric constraints of pose estimation. (a) (b) Figure 6. (a) SIFT-VIP matches and estimated camera pose for a scene with 3 planes. (b) Examples for warped SIFT patches and orthographic VIP patches. From left to right: Extraced SIFT patch from query images, warped SIFT patch, VIP patch of 3d model. The VIP patches are impressively well aligned to the warped SIFT patches, despite the inaccuracies of the camera pose. Figure 8. Matches from the 3D model search. Left: Query image from the forward camera. Right: Retrieved 3D models.

8 (a) [7] G. Schindler, M. Brown, and R. Szeliski. City-scale location recognition. In Proc. IEEE Conference on Computer Vision and Pattern Recognition, Minneapolis, Minnesota, pages 1 7, [8] H. Shao, T. Svoboda, T. Tuytelaars,, and L. J. V. Gool. Hpat indexing for fast object/scene recognition based on local appearance. In Conference on Image and video retrieval, pages 71 80, [9] C. Wu, B. Clipp, X. Li, J.-M. Frahm, and M. Pollefeys. 3d model matching with viewpoint invariant patches (vips). In To appear in Proc. IEEE Conference on Computer Vision and Pattern Recognition, [10] G. Yang, J. Becker, and C. Stewart. Estimating the location of a camera with respect to a 3d model. In 3DIM07, pages , [11] W. Zhang and J. Kosecka. Image based localization in urban environments. In 3D Data Processing, Visualization and Transmission, pages 33 40, (b) Figure 9. (a) Screenshots of our 3D model search tool. The query image can be selected from a list on the left. As a result the corresponding 3D model shows up. (b) Query with an image from a camera phone. feature model for object retrieval. In Proc. 11th International Conference on Computer Vision, Rio de Janeiro, Brazil, [3] D. Lowe. Distinctive image features from scale-invariant keypoints. International Journal of Computer Vision, 60(2):91 110, [4] K. Mikolajczyk, T. Tuytelaars, C. Schmid, A. Zisserman, J. Matas, F. Schaffalitzky, T. Kadir, and L. Van Gool. A comparison of affine region detectors. International Journal of Computer Vision, 65(1-2):43 72, [5] D. Nistér and H. Stewénius. Scalable recognition with a vocabulary tree. In Proc. IEEE Conference on Computer Vision and Pattern Recognition, New York City, New York, pages , [6] D. Robertsone and R. Cipolla. An image-based system for urban navigation. In Proc. 14th British Machine Vision Conference, London, UK, pages 1 10, 2004.

Visual Word based Location Recognition in 3D models using Distance Augmented Weighting

Visual Word based Location Recognition in 3D models using Distance Augmented Weighting Visual Word based Location Recognition in 3D models using Distance Augmented Weighting Friedrich Fraundorfer 1, Changchang Wu 2, 1 Department of Computer Science ETH Zürich, Switzerland {fraundorfer, marc.pollefeys}@inf.ethz.ch

More information

Viewpoint Invariant Features from Single Images Using 3D Geometry

Viewpoint Invariant Features from Single Images Using 3D Geometry Viewpoint Invariant Features from Single Images Using 3D Geometry Yanpeng Cao and John McDonald Department of Computer Science National University of Ireland, Maynooth, Ireland {y.cao,johnmcd}@cs.nuim.ie

More information

From Structure-from-Motion Point Clouds to Fast Location Recognition

From Structure-from-Motion Point Clouds to Fast Location Recognition From Structure-from-Motion Point Clouds to Fast Location Recognition Arnold Irschara1;2, Christopher Zach2, Jan-Michael Frahm2, Horst Bischof1 1Graz University of Technology firschara, bischofg@icg.tugraz.at

More information

Global localization from a single feature correspondence

Global localization from a single feature correspondence Global localization from a single feature correspondence Friedrich Fraundorfer and Horst Bischof Institute for Computer Graphics and Vision Graz University of Technology {fraunfri,bischof}@icg.tu-graz.ac.at

More information

Midterm Wed. Local features: detection and description. Today. Last time. Local features: main components. Goal: interest operator repeatability

Midterm Wed. Local features: detection and description. Today. Last time. Local features: main components. Goal: interest operator repeatability Midterm Wed. Local features: detection and description Monday March 7 Prof. UT Austin Covers material up until 3/1 Solutions to practice eam handed out today Bring a 8.5 11 sheet of notes if you want Review

More information

Geometry based Repetition Detection for Urban Scene

Geometry based Repetition Detection for Urban Scene Geometry based Repetition Detection for Urban Scene Changchang Wu University of Washington Jan Michael Frahm UNC Chapel Hill Marc Pollefeys ETH Zürich Related Work Sparse Feature Matching [Loy et al. 06,

More information

Instance-level recognition part 2

Instance-level recognition part 2 Visual Recognition and Machine Learning Summer School Paris 2011 Instance-level recognition part 2 Josef Sivic http://www.di.ens.fr/~josef INRIA, WILLOW, ENS/INRIA/CNRS UMR 8548 Laboratoire d Informatique,

More information

Local features and image matching. Prof. Xin Yang HUST

Local features and image matching. Prof. Xin Yang HUST Local features and image matching Prof. Xin Yang HUST Last time RANSAC for robust geometric transformation estimation Translation, Affine, Homography Image warping Given a 2D transformation T and a source

More information

Local features: detection and description May 12 th, 2015

Local features: detection and description May 12 th, 2015 Local features: detection and description May 12 th, 2015 Yong Jae Lee UC Davis Announcements PS1 grades up on SmartSite PS1 stats: Mean: 83.26 Standard Dev: 28.51 PS2 deadline extended to Saturday, 11:59

More information

Evaluation and comparison of interest points/regions

Evaluation and comparison of interest points/regions Introduction Evaluation and comparison of interest points/regions Quantitative evaluation of interest point/region detectors points / regions at the same relative location and area Repeatability rate :

More information

Local features: detection and description. Local invariant features

Local features: detection and description. Local invariant features Local features: detection and description Local invariant features Detection of interest points Harris corner detection Scale invariant blob detection: LoG Description of local patches SIFT : Histograms

More information

Scale Invariant Feature Transform

Scale Invariant Feature Transform Why do we care about matching features? Scale Invariant Feature Transform Camera calibration Stereo Tracking/SFM Image moiaicing Object/activity Recognition Objection representation and recognition Automatic

More information

Specular 3D Object Tracking by View Generative Learning

Specular 3D Object Tracking by View Generative Learning Specular 3D Object Tracking by View Generative Learning Yukiko Shinozuka, Francois de Sorbier and Hideo Saito Keio University 3-14-1 Hiyoshi, Kohoku-ku 223-8522 Yokohama, Japan shinozuka@hvrl.ics.keio.ac.jp

More information

Video Google: A Text Retrieval Approach to Object Matching in Videos

Video Google: A Text Retrieval Approach to Object Matching in Videos Video Google: A Text Retrieval Approach to Object Matching in Videos Josef Sivic, Frederik Schaffalitzky, Andrew Zisserman Visual Geometry Group University of Oxford The vision Enable video, e.g. a feature

More information

Instance-level recognition II.

Instance-level recognition II. Reconnaissance d objets et vision artificielle 2010 Instance-level recognition II. Josef Sivic http://www.di.ens.fr/~josef INRIA, WILLOW, ENS/INRIA/CNRS UMR 8548 Laboratoire d Informatique, Ecole Normale

More information

A Comparison of SIFT, PCA-SIFT and SURF

A Comparison of SIFT, PCA-SIFT and SURF A Comparison of SIFT, PCA-SIFT and SURF Luo Juan Computer Graphics Lab, Chonbuk National University, Jeonju 561-756, South Korea qiuhehappy@hotmail.com Oubong Gwun Computer Graphics Lab, Chonbuk National

More information

Scale Invariant Feature Transform

Scale Invariant Feature Transform Scale Invariant Feature Transform Why do we care about matching features? Camera calibration Stereo Tracking/SFM Image moiaicing Object/activity Recognition Objection representation and recognition Image

More information

A minimal case solution to the calibrated relative pose problem for the case of two known orientation angles

A minimal case solution to the calibrated relative pose problem for the case of two known orientation angles A minimal case solution to the calibrated relative pose problem for the case of two known orientation angles Friedrich Fraundorfer, Petri Tanskanen and Marc Pollefeys Computer Vision and Geometry Lab Department

More information

Local invariant features

Local invariant features Local invariant features Tuesday, Oct 28 Kristen Grauman UT-Austin Today Some more Pset 2 results Pset 2 returned, pick up solutions Pset 3 is posted, due 11/11 Local invariant features Detection of interest

More information

Object Recognition with Invariant Features

Object Recognition with Invariant Features Object Recognition with Invariant Features Definition: Identify objects or scenes and determine their pose and model parameters Applications Industrial automation and inspection Mobile robots, toys, user

More information

Bundling Features for Large Scale Partial-Duplicate Web Image Search

Bundling Features for Large Scale Partial-Duplicate Web Image Search Bundling Features for Large Scale Partial-Duplicate Web Image Search Zhong Wu, Qifa Ke, Michael Isard, and Jian Sun Microsoft Research Abstract In state-of-the-art image retrieval systems, an image is

More information

Image-based Modeling and Rendering: 8. Image Transformation and Panorama

Image-based Modeling and Rendering: 8. Image Transformation and Panorama Image-based Modeling and Rendering: 8. Image Transformation and Panorama I-Chen Lin, Assistant Professor Dept. of CS, National Chiao Tung Univ, Taiwan Outline Image transformation How to represent the

More information

Chapter 3 Image Registration. Chapter 3 Image Registration

Chapter 3 Image Registration. Chapter 3 Image Registration Chapter 3 Image Registration Distributed Algorithms for Introduction (1) Definition: Image Registration Input: 2 images of the same scene but taken from different perspectives Goal: Identify transformation

More information

Handling Urban Location Recognition as a 2D Homothetic Problem

Handling Urban Location Recognition as a 2D Homothetic Problem Handling Urban Location Recognition as a 2D Homothetic Problem Georges Baatz 1, Kevin Köser 1, David Chen 2, Radek Grzeszczuk 3, and Marc Pollefeys 1 1 Department of Computer Science, ETH Zurich, Switzerland

More information

Feature Based Registration - Image Alignment

Feature Based Registration - Image Alignment Feature Based Registration - Image Alignment Image Registration Image registration is the process of estimating an optimal transformation between two or more images. Many slides from Alexei Efros http://graphics.cs.cmu.edu/courses/15-463/2007_fall/463.html

More information

Stereo and Epipolar geometry

Stereo and Epipolar geometry Previously Image Primitives (feature points, lines, contours) Today: Stereo and Epipolar geometry How to match primitives between two (multiple) views) Goals: 3D reconstruction, recognition Jana Kosecka

More information

Gain Adaptive Real-Time Stereo Streaming

Gain Adaptive Real-Time Stereo Streaming Gain Adaptive Real-Time Stereo Streaming S. J. Kim +, D. Gallup +, J.-M. Frahm +, A. Akbarzadeh, Q. Yang, R. Yang, D. Nistér and M. Pollefeys + + Department of Computer Science Department of Computer Science

More information

Simultaneous Recognition and Homography Extraction of Local Patches with a Simple Linear Classifier

Simultaneous Recognition and Homography Extraction of Local Patches with a Simple Linear Classifier Simultaneous Recognition and Homography Extraction of Local Patches with a Simple Linear Classifier Stefan Hinterstoisser 1, Selim Benhimane 1, Vincent Lepetit 2, Pascal Fua 2, Nassir Navab 1 1 Department

More information

Performance Evaluation of Scale-Interpolated Hessian-Laplace and Haar Descriptors for Feature Matching

Performance Evaluation of Scale-Interpolated Hessian-Laplace and Haar Descriptors for Feature Matching Performance Evaluation of Scale-Interpolated Hessian-Laplace and Haar Descriptors for Feature Matching Akshay Bhatia, Robert Laganière School of Information Technology and Engineering University of Ottawa

More information

Stereoscopic Images Generation By Monocular Camera

Stereoscopic Images Generation By Monocular Camera Stereoscopic Images Generation By Monocular Camera Swapnil Lonare M. tech Student Department of Electronics Engineering (Communication) Abha Gaikwad - Patil College of Engineering. Nagpur, India 440016

More information

Large scale object/scene recognition

Large scale object/scene recognition Large scale object/scene recognition Image dataset: > 1 million images query Image search system ranked image list Each image described by approximately 2000 descriptors 2 10 9 descriptors to index! Database

More information

arxiv: v1 [cs.cv] 28 Sep 2018

arxiv: v1 [cs.cv] 28 Sep 2018 Camera Pose Estimation from Sequence of Calibrated Images arxiv:1809.11066v1 [cs.cv] 28 Sep 2018 Jacek Komorowski 1 and Przemyslaw Rokita 2 1 Maria Curie-Sklodowska University, Institute of Computer Science,

More information

SEARCH BY MOBILE IMAGE BASED ON VISUAL AND SPATIAL CONSISTENCY. Xianglong Liu, Yihua Lou, Adams Wei Yu, Bo Lang

SEARCH BY MOBILE IMAGE BASED ON VISUAL AND SPATIAL CONSISTENCY. Xianglong Liu, Yihua Lou, Adams Wei Yu, Bo Lang SEARCH BY MOBILE IMAGE BASED ON VISUAL AND SPATIAL CONSISTENCY Xianglong Liu, Yihua Lou, Adams Wei Yu, Bo Lang State Key Laboratory of Software Development Environment Beihang University, Beijing 100191,

More information

AUTOMATED 3D RECONSTRUCTION OF URBAN AREAS FROM NETWORKS OF WIDE-BASELINE IMAGE SEQUENCES

AUTOMATED 3D RECONSTRUCTION OF URBAN AREAS FROM NETWORKS OF WIDE-BASELINE IMAGE SEQUENCES AUTOMATED 3D RECONSTRUCTION OF URBAN AREAS FROM NETWORKS OF WIDE-BASELINE IMAGE SEQUENCES HelmutMayer, JanBartelsen Institute of Geoinformation and Computer Vision, Bundeswehr University Munich - (Helmut.Mayer,

More information

Large Scale Image Retrieval

Large Scale Image Retrieval Large Scale Image Retrieval Ondřej Chum and Jiří Matas Center for Machine Perception Czech Technical University in Prague Features Affine invariant features Efficient descriptors Corresponding regions

More information

CS4670: Computer Vision

CS4670: Computer Vision CS4670: Computer Vision Noah Snavely Lecture 6: Feature matching and alignment Szeliski: Chapter 6.1 Reading Last time: Corners and blobs Scale-space blob detector: Example Feature descriptors We know

More information

Construction of Precise Local Affine Frames

Construction of Precise Local Affine Frames Construction of Precise Local Affine Frames Andrej Mikulik, Jiri Matas, Michal Perdoch, Ondrej Chum Center for Machine Perception Czech Technical University in Prague Czech Republic e-mail: mikulik@cmp.felk.cvut.cz

More information

Local Features: Detection, Description & Matching

Local Features: Detection, Description & Matching Local Features: Detection, Description & Matching Lecture 08 Computer Vision Material Citations Dr George Stockman Professor Emeritus, Michigan State University Dr David Lowe Professor, University of British

More information

Computer Vision for HCI. Topics of This Lecture

Computer Vision for HCI. Topics of This Lecture Computer Vision for HCI Interest Points Topics of This Lecture Local Invariant Features Motivation Requirements, Invariances Keypoint Localization Features from Accelerated Segment Test (FAST) Harris Shi-Tomasi

More information

SIFT: SCALE INVARIANT FEATURE TRANSFORM SURF: SPEEDED UP ROBUST FEATURES BASHAR ALSADIK EOS DEPT. TOPMAP M13 3D GEOINFORMATION FROM IMAGES 2014

SIFT: SCALE INVARIANT FEATURE TRANSFORM SURF: SPEEDED UP ROBUST FEATURES BASHAR ALSADIK EOS DEPT. TOPMAP M13 3D GEOINFORMATION FROM IMAGES 2014 SIFT: SCALE INVARIANT FEATURE TRANSFORM SURF: SPEEDED UP ROBUST FEATURES BASHAR ALSADIK EOS DEPT. TOPMAP M13 3D GEOINFORMATION FROM IMAGES 2014 SIFT SIFT: Scale Invariant Feature Transform; transform image

More information

Building a Panorama. Matching features. Matching with Features. How do we build a panorama? Computational Photography, 6.882

Building a Panorama. Matching features. Matching with Features. How do we build a panorama? Computational Photography, 6.882 Matching features Building a Panorama Computational Photography, 6.88 Prof. Bill Freeman April 11, 006 Image and shape descriptors: Harris corner detectors and SIFT features. Suggested readings: Mikolajczyk

More information

Local Features Tutorial: Nov. 8, 04

Local Features Tutorial: Nov. 8, 04 Local Features Tutorial: Nov. 8, 04 Local Features Tutorial References: Matlab SIFT tutorial (from course webpage) Lowe, David G. Distinctive Image Features from Scale Invariant Features, International

More information

Motion Estimation and Optical Flow Tracking

Motion Estimation and Optical Flow Tracking Image Matching Image Retrieval Object Recognition Motion Estimation and Optical Flow Tracking Example: Mosiacing (Panorama) M. Brown and D. G. Lowe. Recognising Panoramas. ICCV 2003 Example 3D Reconstruction

More information

A Novel Algorithm for Color Image matching using Wavelet-SIFT

A Novel Algorithm for Color Image matching using Wavelet-SIFT International Journal of Scientific and Research Publications, Volume 5, Issue 1, January 2015 1 A Novel Algorithm for Color Image matching using Wavelet-SIFT Mupuri Prasanth Babu *, P. Ravi Shankar **

More information

Instance-level recognition

Instance-level recognition Instance-level recognition 1) Local invariant features 2) Matching and recognition with local features 3) Efficient visual search 4) Very large scale indexing Matching of descriptors Matching and 3D reconstruction

More information

Large Scale 3D Reconstruction by Structure from Motion

Large Scale 3D Reconstruction by Structure from Motion Large Scale 3D Reconstruction by Structure from Motion Devin Guillory Ziang Xie CS 331B 7 October 2013 Overview Rome wasn t built in a day Overview of SfM Building Rome in a Day Building Rome on a Cloudless

More information

Camera Drones Lecture 3 3D data generation

Camera Drones Lecture 3 3D data generation Camera Drones Lecture 3 3D data generation Ass.Prof. Friedrich Fraundorfer WS 2017 Outline SfM introduction SfM concept Feature matching Camera pose estimation Bundle adjustment Dense matching Data products

More information

THE image based localization problem as considered in

THE image based localization problem as considered in JOURNAL OF L A TEX CLASS FILES, VOL. 6, NO. 1, JANUARY 2007 1 Image Based Localization Wei Zhang, Member, IEEE, and Jana Košecká Member, IEEE, Abstract In this paper we present an approach for image based

More information

Patch Descriptors. CSE 455 Linda Shapiro

Patch Descriptors. CSE 455 Linda Shapiro Patch Descriptors CSE 455 Linda Shapiro How can we find corresponding points? How can we find correspondences? How do we describe an image patch? How do we describe an image patch? Patches with similar

More information

Measuring camera translation by the dominant apical angle

Measuring camera translation by the dominant apical angle Measuring camera translation by the dominant apical angle Akihiko Torii 1 Michal Havlena 1 Tomáš Pajdla 1 1 CMP, Czech Technical University Prague, Czech Republic {torii,havlem1,pajdla}@cmp.felk.cvut.cz

More information

Instance-level recognition

Instance-level recognition Instance-level recognition 1) Local invariant features 2) Matching and recognition with local features 3) Efficient visual search 4) Very large scale indexing Matching of descriptors Matching and 3D reconstruction

More information

Invariant Features from Interest Point Groups

Invariant Features from Interest Point Groups Invariant Features from Interest Point Groups Matthew Brown and David Lowe {mbrown lowe}@cs.ubc.ca Department of Computer Science, University of British Columbia, Vancouver, Canada. Abstract This paper

More information

The SIFT (Scale Invariant Feature

The SIFT (Scale Invariant Feature The SIFT (Scale Invariant Feature Transform) Detector and Descriptor developed by David Lowe University of British Columbia Initial paper ICCV 1999 Newer journal paper IJCV 2004 Review: Matt Brown s Canonical

More information

calibrated coordinates Linear transformation pixel coordinates

calibrated coordinates Linear transformation pixel coordinates 1 calibrated coordinates Linear transformation pixel coordinates 2 Calibration with a rig Uncalibrated epipolar geometry Ambiguities in image formation Stratified reconstruction Autocalibration with partial

More information

Patch Descriptors. EE/CSE 576 Linda Shapiro

Patch Descriptors. EE/CSE 576 Linda Shapiro Patch Descriptors EE/CSE 576 Linda Shapiro 1 How can we find corresponding points? How can we find correspondences? How do we describe an image patch? How do we describe an image patch? Patches with similar

More information

Lecture 14: Indexing with local features. Thursday, Nov 1 Prof. Kristen Grauman. Outline

Lecture 14: Indexing with local features. Thursday, Nov 1 Prof. Kristen Grauman. Outline Lecture 14: Indexing with local features Thursday, Nov 1 Prof. Kristen Grauman Outline Last time: local invariant features, scale invariant detection Applications, including stereo Indexing with invariant

More information

Stereo Vision. MAN-522 Computer Vision

Stereo Vision. MAN-522 Computer Vision Stereo Vision MAN-522 Computer Vision What is the goal of stereo vision? The recovery of the 3D structure of a scene using two or more images of the 3D scene, each acquired from a different viewpoint in

More information

Visual localization using global visual features and vanishing points

Visual localization using global visual features and vanishing points Visual localization using global visual features and vanishing points Olivier Saurer, Friedrich Fraundorfer, and Marc Pollefeys Computer Vision and Geometry Group, ETH Zürich, Switzerland {saurero,fraundorfer,marc.pollefeys}@inf.ethz.ch

More information

Closing the Loop in Appearance-Guided Structure-from-Motion for Omnidirectional Cameras

Closing the Loop in Appearance-Guided Structure-from-Motion for Omnidirectional Cameras Closing the Loop in Appearance-Guided Structure-from-Motion for Omnidirectional Cameras Davide Scaramuzza 1, Friedrich Fraundorfer 2, Marc Pollefeys 2, and Roland Siegwart 1 1 Autonomous Systems Lab, ETH

More information

SUMMARY: DISTINCTIVE IMAGE FEATURES FROM SCALE- INVARIANT KEYPOINTS

SUMMARY: DISTINCTIVE IMAGE FEATURES FROM SCALE- INVARIANT KEYPOINTS SUMMARY: DISTINCTIVE IMAGE FEATURES FROM SCALE- INVARIANT KEYPOINTS Cognitive Robotics Original: David G. Lowe, 004 Summary: Coen van Leeuwen, s1460919 Abstract: This article presents a method to extract

More information

Features Points. Andrea Torsello DAIS Università Ca Foscari via Torino 155, Mestre (VE)

Features Points. Andrea Torsello DAIS Università Ca Foscari via Torino 155, Mestre (VE) Features Points Andrea Torsello DAIS Università Ca Foscari via Torino 155, 30172 Mestre (VE) Finding Corners Edge detectors perform poorly at corners. Corners provide repeatable points for matching, so

More information

Photo Tourism: Exploring Photo Collections in 3D

Photo Tourism: Exploring Photo Collections in 3D Photo Tourism: Exploring Photo Collections in 3D SIGGRAPH 2006 Noah Snavely Steven M. Seitz University of Washington Richard Szeliski Microsoft Research 2006 2006 Noah Snavely Noah Snavely Reproduced with

More information

Region matching for omnidirectional images using virtual camera planes

Region matching for omnidirectional images using virtual camera planes Computer Vision Winter Workshop 2006, Ondřej Chum, Vojtěch Franc (eds.) Telč, Czech Republic, February 6 8 Czech Pattern Recognition Society Region matching for omnidirectional images using virtual camera

More information

3D reconstruction how accurate can it be?

3D reconstruction how accurate can it be? Performance Metrics for Correspondence Problems 3D reconstruction how accurate can it be? Pierre Moulon, Foxel CVPR 2015 Workshop Boston, USA (June 11, 2015) We can capture large environments. But for

More information

SIFT - scale-invariant feature transform Konrad Schindler

SIFT - scale-invariant feature transform Konrad Schindler SIFT - scale-invariant feature transform Konrad Schindler Institute of Geodesy and Photogrammetry Invariant interest points Goal match points between images with very different scale, orientation, projective

More information

Fuzzy based Multiple Dictionary Bag of Words for Image Classification

Fuzzy based Multiple Dictionary Bag of Words for Image Classification Available online at www.sciencedirect.com Procedia Engineering 38 (2012 ) 2196 2206 International Conference on Modeling Optimisation and Computing Fuzzy based Multiple Dictionary Bag of Words for Image

More information

Prof. Feng Liu. Spring /26/2017

Prof. Feng Liu. Spring /26/2017 Prof. Feng Liu Spring 2017 http://www.cs.pdx.edu/~fliu/courses/cs510/ 04/26/2017 Last Time Re-lighting HDR 2 Today Panorama Overview Feature detection Mid-term project presentation Not real mid-term 6

More information

Hierarchical Building Recognition

Hierarchical Building Recognition Hierarchical Building Recognition Anonymous Abstract We propose a novel and efficient method for recognition of buildings and locations with dominant man-made structures. The method exploits two different

More information

Feature Detection and Matching

Feature Detection and Matching and Matching CS4243 Computer Vision and Pattern Recognition Leow Wee Kheng Department of Computer Science School of Computing National University of Singapore Leow Wee Kheng (CS4243) Camera Models 1 /

More information

3D Model Acquisition by Tracking 2D Wireframes

3D Model Acquisition by Tracking 2D Wireframes 3D Model Acquisition by Tracking 2D Wireframes M. Brown, T. Drummond and R. Cipolla {96mab twd20 cipolla}@eng.cam.ac.uk Department of Engineering University of Cambridge Cambridge CB2 1PZ, UK Abstract

More information

Region Graphs for Organizing Image Collections

Region Graphs for Organizing Image Collections Region Graphs for Organizing Image Collections Alexander Ladikos 1, Edmond Boyer 2, Nassir Navab 1, and Slobodan Ilic 1 1 Chair for Computer Aided Medical Procedures, Technische Universität München 2 Perception

More information

Augmenting Reality, Naturally:

Augmenting Reality, Naturally: Augmenting Reality, Naturally: Scene Modelling, Recognition and Tracking with Invariant Image Features by Iryna Gordon in collaboration with David G. Lowe Laboratory for Computational Intelligence Department

More information

Large-scale visual recognition The bag-of-words representation

Large-scale visual recognition The bag-of-words representation Large-scale visual recognition The bag-of-words representation Florent Perronnin, XRCE Hervé Jégou, INRIA CVPR tutorial June 16, 2012 Outline Bag-of-words Large or small vocabularies? Extensions for instance-level

More information

Speeding up the Detection of Line Drawings Using a Hash Table

Speeding up the Detection of Line Drawings Using a Hash Table Speeding up the Detection of Line Drawings Using a Hash Table Weihan Sun, Koichi Kise 2 Graduate School of Engineering, Osaka Prefecture University, Japan sunweihan@m.cs.osakafu-u.ac.jp, 2 kise@cs.osakafu-u.ac.jp

More information

Feature Detection. Raul Queiroz Feitosa. 3/30/2017 Feature Detection 1

Feature Detection. Raul Queiroz Feitosa. 3/30/2017 Feature Detection 1 Feature Detection Raul Queiroz Feitosa 3/30/2017 Feature Detection 1 Objetive This chapter discusses the correspondence problem and presents approaches to solve it. 3/30/2017 Feature Detection 2 Outline

More information

Dense 3D Reconstruction. Christiano Gava

Dense 3D Reconstruction. Christiano Gava Dense 3D Reconstruction Christiano Gava christiano.gava@dfki.de Outline Previous lecture: structure and motion II Structure and motion loop Triangulation Today: dense 3D reconstruction The matching problem

More information

Lecture 12 Recognition

Lecture 12 Recognition Institute of Informatics Institute of Neuroinformatics Lecture 12 Recognition Davide Scaramuzza 1 Lab exercise today replaced by Deep Learning Tutorial Room ETH HG E 1.1 from 13:15 to 15:00 Optional lab

More information

Unit 3 Multiple View Geometry

Unit 3 Multiple View Geometry Unit 3 Multiple View Geometry Relations between images of a scene Recovering the cameras Recovering the scene structure http://www.robots.ox.ac.uk/~vgg/hzbook/hzbook1.html 3D structure from images Recover

More information

Image Features: Local Descriptors. Sanja Fidler CSC420: Intro to Image Understanding 1/ 58

Image Features: Local Descriptors. Sanja Fidler CSC420: Intro to Image Understanding 1/ 58 Image Features: Local Descriptors Sanja Fidler CSC420: Intro to Image Understanding 1/ 58 [Source: K. Grauman] Sanja Fidler CSC420: Intro to Image Understanding 2/ 58 Local Features Detection: Identify

More information

Outline 7/2/201011/6/

Outline 7/2/201011/6/ Outline Pattern recognition in computer vision Background on the development of SIFT SIFT algorithm and some of its variations Computational considerations (SURF) Potential improvement Summary 01 2 Pattern

More information

Robust Online Object Learning and Recognition by MSER Tracking

Robust Online Object Learning and Recognition by MSER Tracking Computer Vision Winter Workshop 28, Janez Perš (ed.) Moravske Toplice, Slovenia, February 4 6 Slovenian Pattern Recognition Society, Ljubljana, Slovenia Robust Online Object Learning and Recognition by

More information

Efficient Representation of Local Geometry for Large Scale Object Retrieval

Efficient Representation of Local Geometry for Large Scale Object Retrieval Efficient Representation of Local Geometry for Large Scale Object Retrieval Michal Perďoch Ondřej Chum and Jiří Matas Center for Machine Perception Czech Technical University in Prague IEEE Computer Society

More information

CS223b Midterm Exam, Computer Vision. Monday February 25th, Winter 2008, Prof. Jana Kosecka

CS223b Midterm Exam, Computer Vision. Monday February 25th, Winter 2008, Prof. Jana Kosecka CS223b Midterm Exam, Computer Vision Monday February 25th, Winter 2008, Prof. Jana Kosecka Your name email This exam is 8 pages long including cover page. Make sure your exam is not missing any pages.

More information

III. VERVIEW OF THE METHODS

III. VERVIEW OF THE METHODS An Analytical Study of SIFT and SURF in Image Registration Vivek Kumar Gupta, Kanchan Cecil Department of Electronics & Telecommunication, Jabalpur engineering college, Jabalpur, India comparing the distance

More information

Improving feature based object recognition in service robotics by disparity map based segmentation.

Improving feature based object recognition in service robotics by disparity map based segmentation. The 2010 IEEE/RSJ International Conference on Intelligent Robots and Systems October 18-22, 2010, Taipei, Taiwan Improving feature based object recognition in service robotics by disparity map based segmentation.

More information

Robust 6DOF Motion Estimation for Non-Overlapping, Multi-Camera Systems

Robust 6DOF Motion Estimation for Non-Overlapping, Multi-Camera Systems Robust 6DOF Motion Estimation for Non-Overlapping, Multi-Camera Systems Brian Clipp 1, Jae-Hak Kim 2, Jan-Michael Frahm 1, Marc Pollefeys 3 and Richard Hartley 2 1 Department of Computer Science 2 Research

More information

A NEW FEATURE BASED IMAGE REGISTRATION ALGORITHM INTRODUCTION

A NEW FEATURE BASED IMAGE REGISTRATION ALGORITHM INTRODUCTION A NEW FEATURE BASED IMAGE REGISTRATION ALGORITHM Karthik Krish Stuart Heinrich Wesley E. Snyder Halil Cakir Siamak Khorram North Carolina State University Raleigh, 27695 kkrish@ncsu.edu sbheinri@ncsu.edu

More information

URBAN STRUCTURE ESTIMATION USING PARALLEL AND ORTHOGONAL LINES

URBAN STRUCTURE ESTIMATION USING PARALLEL AND ORTHOGONAL LINES URBAN STRUCTURE ESTIMATION USING PARALLEL AND ORTHOGONAL LINES An Undergraduate Research Scholars Thesis by RUI LIU Submitted to Honors and Undergraduate Research Texas A&M University in partial fulfillment

More information

Visual Recognition and Search April 18, 2008 Joo Hyun Kim

Visual Recognition and Search April 18, 2008 Joo Hyun Kim Visual Recognition and Search April 18, 2008 Joo Hyun Kim Introduction Suppose a stranger in downtown with a tour guide book?? Austin, TX 2 Introduction Look at guide What s this? Found Name of place Where

More information

Salient Visual Features to Help Close the Loop in 6D SLAM

Salient Visual Features to Help Close the Loop in 6D SLAM Visual Features to Help Close the Loop in 6D SLAM Lars Kunze, Kai Lingemann, Andreas Nüchter, and Joachim Hertzberg University of Osnabrück, Institute of Computer Science Knowledge Based Systems Research

More information

An Overview of Matchmoving using Structure from Motion Methods

An Overview of Matchmoving using Structure from Motion Methods An Overview of Matchmoving using Structure from Motion Methods Kamyar Haji Allahverdi Pour Department of Computer Engineering Sharif University of Technology Tehran, Iran Email: allahverdi@ce.sharif.edu

More information

CS 4495 Computer Vision A. Bobick. CS 4495 Computer Vision. Features 2 SIFT descriptor. Aaron Bobick School of Interactive Computing

CS 4495 Computer Vision A. Bobick. CS 4495 Computer Vision. Features 2 SIFT descriptor. Aaron Bobick School of Interactive Computing CS 4495 Computer Vision Features 2 SIFT descriptor Aaron Bobick School of Interactive Computing Administrivia PS 3: Out due Oct 6 th. Features recap: Goal is to find corresponding locations in two images.

More information

Indexing local features and instance recognition May 16 th, 2017

Indexing local features and instance recognition May 16 th, 2017 Indexing local features and instance recognition May 16 th, 2017 Yong Jae Lee UC Davis Announcements PS2 due next Monday 11:59 am 2 Recap: Features and filters Transforming and describing images; textures,

More information

Hamming embedding and weak geometric consistency for large scale image search

Hamming embedding and weak geometric consistency for large scale image search Hamming embedding and weak geometric consistency for large scale image search Herve Jegou, Matthijs Douze, and Cordelia Schmid INRIA Grenoble, LEAR, LJK firstname.lastname@inria.fr Abstract. This paper

More information

Indexing local features and instance recognition May 14 th, 2015

Indexing local features and instance recognition May 14 th, 2015 Indexing local features and instance recognition May 14 th, 2015 Yong Jae Lee UC Davis Announcements PS2 due Saturday 11:59 am 2 We can approximate the Laplacian with a difference of Gaussians; more efficient

More information

Structure Guided Salient Region Detector

Structure Guided Salient Region Detector Structure Guided Salient Region Detector Shufei Fan, Frank Ferrie Center for Intelligent Machines McGill University Montréal H3A2A7, Canada Abstract This paper presents a novel method for detection of

More information

Regular Paper Challenges in wide-area structure-from-motion

Regular Paper Challenges in wide-area structure-from-motion Regular Paper Challenges in wide-area structure-from-motion Marc Pollefeys (Institute of Visual Computing, ETH Zurich) (Dept. of Computer Science, University of North Carolina at Chapel Hill) Jan-Michael

More information

Colorado School of Mines. Computer Vision. Professor William Hoff Dept of Electrical Engineering &Computer Science.

Colorado School of Mines. Computer Vision. Professor William Hoff Dept of Electrical Engineering &Computer Science. Professor William Hoff Dept of Electrical Engineering &Computer Science http://inside.mines.edu/~whoff/ 1 Object Recognition in Large Databases Some material for these slides comes from www.cs.utexas.edu/~grauman/courses/spring2011/slides/lecture18_index.pptx

More information

Static Scene Reconstruction

Static Scene Reconstruction GPU supported Real-Time Scene Reconstruction with a Single Camera Jan-Michael Frahm, 3D Computer Vision group, University of North Carolina at Chapel Hill Static Scene Reconstruction 1 Capture on campus

More information

BUILDING POINT GROUPING USING VIEW-GEOMETRY RELATIONS INTRODUCTION

BUILDING POINT GROUPING USING VIEW-GEOMETRY RELATIONS INTRODUCTION BUILDING POINT GROUPING USING VIEW-GEOMETRY RELATIONS I-Chieh Lee 1, Shaojun He 1, Po-Lun Lai 2, Alper Yilmaz 2 1 Mapping and GIS Laboratory 2 Photogrammetric Computer Vision Laboratory Dept. of Civil

More information