Improving Image-Based Localization Through Increasing Correct Feature Correspondences
|
|
- Barrie Freeman
- 6 years ago
- Views:
Transcription
1 Improving Image-Based Localization Through Increasing Correct Feature Correspondences Guoyu Lu 1, Vincent Ly 1, Haoquan Shen 2, Abhishek Kolagunda 1, and Chandra Kambhamettu 1 1 Video/Image Modeling and Synthesis (VIMS) Lab. Department of Computer and Information Sciences. University of Delaware 2 Zhejiang University, China Abstract. Image-based localization is to provide contextual information based on a query image. Current state-of-the-art methods use 3D Structure-from-Motion reconstruction model to aid in localizing the query image, either by 2D-to-3D matching or by 3D-to-2D matching. By adding camera pose estimation, the system can perform image localization more accurately. However, incorrect feature correspondences between the 2D image and 3D reconstruction remains the main reason for failures in image localization. In our paper, we introduce a new method, which adds features embedding, to reduce the incorrect feature correspondences. We do the query expansion to add correspondences, where the associated 3d point has a high probability to be found in the same camera as the seed set. Using the techniques described, the registration accuracy can be significantly improved. Experiments on several large image datasets have shown our methods to outperform most state-of-the-art methods. 1 Introduction Image-based localization is the approach of estimating the camera positon from the photos. Using an image obtained by a camera, an image-based localization system can compute the camera position and navigate the user. This application has attracted increasing attention in multiple areas, such as robot localization [1] and landmark recognition [2]. Image-based localization is particularly important in areas where GPS signal is weak such as around large buildings. Image-based localization was first attempted by searching through a database containing images of a city. The best matched image to the query image is retrieved as the location indicator. With the development of Structurefrom-Motion (SfM) reconstruction techniques, a 3D model can be generated from a set of city building images. State-of-the-art image-based localization method matches a 2D query image to a 3D Model built using SfM reconstruction to estimate the camera orientation, which achieves higher localization accuracy. SIFT feature [3] is used in the image-based localization task as a local feature to determine correspondences. SIFT feature is invariant to scaling, rotation and partly illumination change, and it has been successfully applied to object recognition, 3D reconstruction, motion tracking and many other computer vision tasks. However, since SIFT features emphasize large bin
2 2 Authors Suppressed Due to Excessive Length values in the distance computation, large values in the descriptors would result in incorrect correspondences. Also the descriptors in 3D SfM reconstruction model is much denser than the descriptors in 2D image, which would also dramatically increase the chance of incorrect correspondences. This results in a less stable configuration for camera pose estimation. In this paper, we add more high confidence 2D-to-3D correspondences using query expansion. The points in the selected correspondences have the highest probability to be seen in the same camera as the seed points. These correspondences increase the possibility of successfully estimating the camera pose. In addition, we make use of Hellinger kernel for computing the distance between descriptors. Instead of using Euclidean space for similarity computation, Hellinger kernel compares the descriptors using L1 distance. The use of Hellinger kernel mitigates the impact of extreme bin values. Also the newly learnt descriptor allows the corresponding descriptors to be better assigned to the same visual word, which we use to search for the nearest neighbor descriptor. The paper is organized as the following sections. Section 2 briefly introduces the related work of image-based localization and the techniques for learning descriptors. Section 3 introduces image-based localization pipeline. Section 4 presents the query expansion method we use. Section 5 discusses Hellinger kernel in SIFT descriptor similarity computation. Section 6 presents our localization result and the analysis. Section 7 concludes the paper. 2 Related Work Image-based localization is based on matching the query image against an image database or 3D reconstruction model. Compared to GPS navigation, image-based localization can still be employed in large building areas and provide higher localization accuracy [4]. Originally, image-based localization method used a database containing the building facade views to estimate the pose of the query image which is associated with a 3D coordinate system [5]. Similarly, [6] searches through an image database for the closest image in descriptor space for localization in urban scene. Method using vocabulary tree is used in [4] to achieve real-time pose estimation. Xiao et al. [7] uses bag-of-words method together with geometric verification to improve the object localization accuracy. Irschara et al. [8] propose to retrieve images containing most descriptors matching the 3D points and [9] realizes the 3D-to-2D matching through mutual visibility information. [10] uses visibility information between points and cameras to choose points for camera pose estimation. Sattler et al. [11] propose to directly match the descriptors extracted from the 2D image against the descriptors from the 3D points to improve the localization accuracy. Local features are largely used in image retrieval problems. To achieve better retrieval accuracy using local features, [12, 13] adopt an approach to learn a lower dimensional embedding from labeled match and non-match pairs. Philbin et al. [14] classify the original descriptor pairs into three groups, positive pairs, nearest neighbor pairs and random negative pairs. A projection matrix is learned through minimizing a marginbased cost function based on the three descriptor pair groups. Hashing methods are also used in reducing the descriptor quantization error. Kulis et al. [15] introduce a scalable
3 Lecture Notes in Computer Science 3 coordinate-descent algorithm to learn functions based on minimizing the error caused by binary embedding. Yagnik et al.[16] present a feature embedding method (WTA- Hash) based on partial order statistics and the embedding method can be extended to the case of polynomial kernels. [17] proposes LDAHash method to learn a projection matrix resembling classic LDA method and then quantize the descriptors into Hamming space whose lower dimensional binary descriptor increases the image retrieval accuracy. [18] adds strong spatial constraints to verify the image return for suppressing the false positives in query expansion. [19] uses a linear SVM to discriminatively learn a weight vector for re-querying, which achieves a significant improvement over standard query expansion method. 3 Image-based Localization The goal of image-based localization is to navigate the user, based on the images captured by their mobile devices. The user takes a photo of his or her surroundings and sends it to the image-based localization system. By matching the query image against the 3D model in the system and conducting pose estimation, the user can receive the navigation and location information from the localization system, as shown in Figure 1. Fig. 1: 2D-to-3D image matching Image-based localization systems originally search through an image database for the best image candidate, where the best is the image with the most feature correspondences. With the development of Structure-from-Motion reconstruction technique, 3D model is used in image-based localization system which allows better orientation estimation. Compared to 2D images, the descriptor space in 3D SfM reconstruction model is much denser. Our image-based localization method uses direct 2D-to-3D matching [11]. The basic idea is to find the correspondences between the 2D features and the 3D points. Correspondences are determined by searching for the 2D descriptor s nearest neighbors from all the descriptors in the 3D space. To accelerate the matching process, all the 3D descriptors are clustered into 100k visual words using the K-means algorithm [20]. Each descriptor extracted from the 2D image is matched to one of the visual words to simplify the search for the nearest neighbor descriptors. When enough correspondences are found, the search process will end.
4 4 Authors Suppressed Due to Excessive Length A kd-tree based approach is used to find the approximate nearest neighbor descriptors, which is supported by the FLANN library [21]. Each 3D point is represented by the mean value of all descriptors belonging to the point. Similar to the 3D descriptors, descriptors in the 2D image are also assigned to the visual words using the same centroid value in assigning 3D descriptors. 2D-to-3D correspondences are searched for within the same visual word. A 2D-to-3D correspondence is accepted if the two nearest neighbor of the 2D query descriptor passes the SIFT test ratio. In case that more than one 2D features match the same 3D point, the 2D feature with the smallest Euclidean distance will be selected as the matching feature. When 100 correspondences are found, we stop searching for correspondences. These correspondences will be used in the later pose estimation process. The threshold of 100 is chosen to balance image registration speed and image registration accuracy. Only images consisting of at least 12 inliers will be registered. A registered image means that the image is correctly matched to a 3D model and the camera pose is known. The inliers are found by the Random Sample Consensus (RANSAC) algorithm [22] using 6-point-direct-linear-transformation (6-point DLT) [23] to estimate the camera 3D pose. In L2 space, a correspondence between a 2D feature and a 3D point (2df, 3dp) is accepted if the square distance of the 2d feature s two nearest neighbor descriptor follows the condition D(2df, 3dp 1 )/D(2df, 3dp 2 ) < (1) D(2df, 3dp) = Dimensionality 1 (2df 3dp) 2 (2) In Equation 2, 2df and 3dp denote the descriptors belonging to the 2d feature and the 3d point. 4 Dual Checking We further add a reverse checking step to further filter the unstable correspondences. The 3d point which passes the ratio test in the last step will be mapped into features in the 2D image. We record the distance between mean value of the 3d point s descriptors and the nearest neighbor 2D descriptor, as well as the distance between the mean value of the 3D point s descriptor and the second nearest neighbor 2D descriptor. If the 3D point passes the ratio test, the correspondence is accepted, as shown in the following equations: D(3dp, 2df 1 )/D(3dp, 2df 2 ) < 0.64 (3) D(3dp, 2df) = Dimensionality 1 (3dp 2df) 2 (4)
5 Lecture Notes in Computer Science 5 In [11], the authors rank the visual words by size (pairs number formed by 3D point and 2D descriptor) from small to large for the purpose of reducing the search cost. In our paper, we count the number of 3D points in each visual word and sort the visual words in decreasing order, as experiments show that the visual words containing more points will be more likely to generate 3d points which pass the SIFT ratio tests in two directions. In each visual word, the points are sorted based on the number of cameras that the point is visible in. The more cameras that the point is visible in, the higher the searching priority. 5 Query Expansion Query Expansion is largely used in web search engines for augmenting search result by adding more keywords. We also use query expansion in our localization pipeline to augment the list of possible correspondences. Inspired by the points choosing method based on camera joint visible possiblity [10], we treat all the currently accepted 2D to 3D correspondence s 3D point as base seeds to be expanded. As multiple points can be seen on the same camera, we choose points which are visible with the base seeds from the same camera, as shown in Equation 5. P 1 S P rob(p 1, P 2) = P 1 S (cameras see P 1) (cameras see P 2) camera number Here, the S is the base seed set. P 1 is the 3d point in the seed set and P 2 is the 3d point in the rest of base set. P rob is the probability that the two points are visible in the same camera. P rob is computed by the number of cameras that see both P 1 and P 2 divided by the total number of cameras. We define a ratio to threshold the minimum number of scenes a point should be seen in before being considered. Among the points passing the ratio, points are ranked based on the sum of the physical distance to all the points in the base seed set, shown as Equation 6. The point with the largest distance will be given the highest priority, as the large distance will aid in pose estimation. We select 100 points with the highest priorities and search the nearest neighbor feature in the 2d image as the correspondences. Distance(P 1, P 2) = P 1 S P 1 S (P 1x P 2x)2 + (P 1y P 2y) 2 + (P 1z P 2z) 2 (6) Here the distance is in the 3d coordinates. P 1x, P 1y and P 1z are separately representing the coordinate value in X, Y and Z directions. (5) 6 Feature Learning Since SIFT feature is used for Structure-from-Motion reconstruction in our 3D model, we also use SIFT in our image-based localization task. SIFT feature is commonly used
6 6 Authors Suppressed Due to Excessive Length in image retrieval problems due to its useful properties: invariance to rotation, scaling and illumination change. However, compared to common image retrieval tasks, imagebased localization is more challenging since the descriptors for 3D models are much denser than the descriptors extracted from 2D images. The greater density of 3D descriptors yields many matches over a smaller region compared to 2D descriptors, thus adding a large number of incorrect correspondences. Higher incorrect correspondences results in poor camera pose estimation results. In [24], the authors describe that only a few components in a SIFT descriptor dominate the similarity computation. Additionally, the sign information is lost with L2-normalized descriptors. For these reasons, SIFT feature still has limits in obtaining sufficient correct correspondences for imagebased localization. To overcome these limitations, [19] proposed the Hellinger kernel to calculate descriptors distance. Instead of computing the Euclidean distance as Equation 7, D(X, Y ) = X Y 2 n n n = x 2 i + y 2 (7) i 2 xi y i i=1 i=1 the similarity between two descriptors is calculated as shown in Equation 8 H(X, Y ) = XY = i=1 n xi y i (8) X and Y are two descriptor vectors. x i and y i represent the components of the vectors X and Y. In original space, SIFT feature uses L2 normalization. Using Hellinger kernel, we normalize SIFT descriptors using L1 distance before calculating two vectors distance. By using Hellinger kernel, the influence of large bin values is reduced while the influence of small bin values becomes more substantial, which aids in rejecting incorrect feature correspondences. i=1 7 Experimental Results To evaluate the performance of our new proposed method for image-based localization, we conducted experiments using the new learnt descriptors, projected using the Hellinger kernel, on 3 challenging datasets: Dubrovnik [9], Vienna[8] and Aachen dataset [25]. Dubrovnik is a large dataset. The 3D model is reconstructed using the photos from Flickr. Some images are removed from the reconstruction together with their descriptors and 3D points that can be seen on only one camera. These removed images are used for query images. The query images of Vienna have a maximum dimension of 1600 pixels in both width and height. The 266 query images for Vienna dataset are selected from Panoramio website. The images in the Aachen dataset was collected over a 2-year period by different cameras. The query image overcomes the typical mobile phone camera shortcomings, such as motion blur and lack of focus. The datasets are representative of several different scenarios. The Vienna dataset images are
7 Lecture Notes in Computer Science 7 from uniform intervals of urban scenes. Dubrovnik depicts large clustered sets of views usually found on Internet photo collection website. Aachen dataset contains different lighting and weather conditions, as well as occlusions by construction sites. Detailed information can be found in Table 1. Dataset number of 3D points number of descriptors Size(MB) number of query images Dubrovnik 1,886,884 9,606,317 1, Vienna 1,123,028 4,854, Aachen 1,540,786 7,281,501 1, Table 1: The datasets used for evaluation. Size describes the binary.info file size with all descriptors and 3D points information. In the 2D-to-3D localization pipeline, all the 3D descriptors are classified into visual words. The query descriptors in the 2D image are also assigned to the visual word, selected by minimizing the distance to the visual word centers. After assigning a query descriptor to a visual word, the correspondence is found for a descriptor via nearest neighbor search. Making use of the new descriptors learned from Hellinger kernel, we re-classify all the descriptors into visual words by K-means and search the correspondences through the newly learned visual words. Experiment results compared with state-of-the-art methods are shown as Table 2 Method Registered images Registered images Registered images of Dubrovnik of Vienna of Aachen P2F [9] Voc.tree(all) [9] Fast Direct 2D-to-3D [11] Voc. tree GPU [8] D-to-3D Hellinger kernel Table 2: Comparison between our method with different state-of-the-art methods. From Table 2, the new system outperforms most state-of-the-art methods in localization accuracy. Using the newly learnt descriptor does not require additional memory. As the process of learning new descriptors and forming visual words can be done offline, the new system does not decrease the speed for the localizing an image. We give image examples that get registered in the new localization pipeline as shown in Figure 2. These images fail to be registered in the original pipeline. From the images shown above, we can see that images with shadow and even large rotations can be registered with our method, yet fail with the old system. Some image examples which fail in the new system are given in Figure 3. As depicted, localization will fail for images with significant illumination change. Images largely dominated by people will also fail registration. This is because from the salient part of the images, we cannot extract features corresponding to the features in the 3D reconstruction model, which result in the failure of the camera pose estimation.
8 8 Authors Suppressed Due to Excessive Length (a) (b) (c) (d) (e) (f) Fig. 2: Image examples that get registered in the new localization pipeline. These images fail to be registered in the original pipeline. (a,b,c,d,e,f) are all the images that cannot be registered in the original localization pipeline. But in the new pipeline, they are successfully registered. (a) (b) (c) (d) (e) (f) Fig. 3: The examples which fail in being registered in the new localization pipeline. The camera pose estimation for the images (a, b, c, d, e, f) is not successful.
9 Lecture Notes in Computer Science 9 8 Conclusion In this paper, we conduct both ways feature matching for 3D point to features in the 2D image (from 2D to 3D and then from 3D to 2D), which gives us reliable correspondences and a seed set. Furthermore, we add a query expansion step to augment our initial list of correspondences with correspondences whose 2D features having a high probability to be jointly visible with a seed point in an image. These correspondences benefit the camera pose estimation in the final step. We also use the Hellinger kernel to learn new descriptors and use the newly learnt descriptors for image-based localization, which is much more challenging than common image retrieval problems. Without requiring additional speed or memory, our system dramatically improves the localization accuracy. We expect that image registration rate and speed can be further improved by pruning less informative points. 9 Acknowledgement This work was made possible by NSF CDI-Type I grant References 1. Meier, L., Tanskanen, P., Fraundorfer, F., Pollefeys, M.: Pixhawk: A system for autonomous flight using onboard computer vision. In: In ICRA. (2011) 2. Chen, D., Baatz, G., Koser, K., Tsai, S., Vedantham, R., Pylvanainen, T., Roimela, K., Chen, X., Bach, J., Pollefeys, M., Girod, B., Grzeszczuk, R.: City-scale landmark identification on mobile devices. In: In CVPR. (2011) 3. Lowe, D.G.: Distinctive image features from scale-invariant keypoints. IJCV 60 (2004) 4. Steinhoff, U., O., D., Perko, R., Schiele, B., Leonardis, A.: How computer vision can help in outdoor positioning. In: European conference on Ambient intelligence. AmI 07 (2007) 5. Robertson, D., Cipolla, R.: An image-based system for urban navigation. In: BMVC. (2004) 6. Zhang, W., Kosecka, J.: Image based localization in urban environments. In: 3DPVT. (2006) 7. Xiao, J., Chen, J., Yeung, D., Quan, L.: Structuring visual words in 3d for arbitrary-view object localization. In: ECCV. (2008) 8. Irschara, A., Zach, C., Frahm, J., Bischof, H.: From structure-from-motion point clouds to fast location recognition. In: CVPR. (2009) 9. Li, Y., Snavely, N., Huttenlocher, D.P.: Location recognition using prioritized feature matching. In: ECCV. (2010) 10. Choudhary, S., Narayanan, P.: Visibility probability structure from sfm datasets and applications. In: ECCV. (2012) 11. Sattler, T., Leibe, B., Kobbelt, L.: Fast image-based localization using direct 2d-to-3d matching. In: ICCV. (2011) 12. Hua, G., Brown, M., Winder, S.: Discriminant embedding for local image descriptors. In: ICCV. (2007) 13. Winder, S., Hua, G., Brown, M.: Picking the best daisy. In: CVPR. (2009) 14. Philbin, J., Isard, M., Sivic, J., Zisserman, A.: Descriptor learning for efficient retrieval. In: ECCV. (2010)
10 10 Authors Suppressed Due to Excessive Length 15. Kulis, B., Darrell, T.: Learning to hash with binary reconstructive embeddings. In: NIPS. (2009) 16. Yagnik, J., Strelow, D., Ross, D., sung Lin, R.: The power of comparative reasoning. In: ICCV. (2011) 17. Strecha, C., Bronstein, A., Bronstein, M., Fua, P.: Ldahash: Improved matching with smaller descriptors. TPAMI 34 (2012) 18. Chum, O., Philbin, J., Sivic, J., Isard, M., Zisserman, A.: Total recall: Automatic query expansion with a generative feature model for object retrieval. In: ICCV. (2011) 19. Arandjelovic, R., Zisserman, A.: Three things everyone should know to improve object retrieval. In: In CVPR. (2012) 20. Philbin, J., Chum, O., Isard, M., J., S., Zisserman, A.: Object retrieval with large vocabularies and fast spatial matching. In: CVPR. (2007) 21. Muja, M., Lowe, D.: Fast approximate nearest neighbors with automatic algorithm configuration. In: ICCTPA. (2009) 22. Fischler, M.A., Bolles, R.C.: Random sample consensus: A paradigm for model fitting with applications to image analysis and automated cartography. Communications of the ACM 24 (1981) 23. Hartley, R., Zisserman, A.: Multiple View Geometry in Computer Vision. Cambridge University Press, ISBN: (2004) 24. Jain, M., Benmokhtar, R., Gros, P., Jegou, H.: Hamming embedding similarity-based image classification. In: In ICMR. (2012) 25. Sattler, T., Weyand, T., Leibe, B., Kobbelt, L.: Image retrieval for image-based localization revisited. In: In BMVC. (2012)
Memory efficient large-scale image-based localization
DOI 10.1007/s11042-014-1977-3 Memory efficient large-scale image-based localization Guoyu Lu Nicu Sebe Congfu Xu Chandra Kambhamettu Springer Science+Business Media New York 2014 Abstract Local features
More informationInstance-level recognition part 2
Visual Recognition and Machine Learning Summer School Paris 2011 Instance-level recognition part 2 Josef Sivic http://www.di.ens.fr/~josef INRIA, WILLOW, ENS/INRIA/CNRS UMR 8548 Laboratoire d Informatique,
More informationInstance-level recognition II.
Reconnaissance d objets et vision artificielle 2010 Instance-level recognition II. Josef Sivic http://www.di.ens.fr/~josef INRIA, WILLOW, ENS/INRIA/CNRS UMR 8548 Laboratoire d Informatique, Ecole Normale
More informationLarge-scale visual recognition The bag-of-words representation
Large-scale visual recognition The bag-of-words representation Florent Perronnin, XRCE Hervé Jégou, INRIA CVPR tutorial June 16, 2012 Outline Bag-of-words Large or small vocabularies? Extensions for instance-level
More informationThree things everyone should know to improve object retrieval. Relja Arandjelović and Andrew Zisserman (CVPR 2012)
Three things everyone should know to improve object retrieval Relja Arandjelović and Andrew Zisserman (CVPR 2012) University of Oxford 2 nd April 2012 Large scale object retrieval Find all instances of
More informationLarge Scale 3D Reconstruction by Structure from Motion
Large Scale 3D Reconstruction by Structure from Motion Devin Guillory Ziang Xie CS 331B 7 October 2013 Overview Rome wasn t built in a day Overview of SfM Building Rome in a Day Building Rome on a Cloudless
More informationA Systems View of Large- Scale 3D Reconstruction
Lecture 23: A Systems View of Large- Scale 3D Reconstruction Visual Computing Systems Goals and motivation Construct a detailed 3D model of the world from unstructured photographs (e.g., Flickr, Facebook)
More informationVisual Recognition and Search April 18, 2008 Joo Hyun Kim
Visual Recognition and Search April 18, 2008 Joo Hyun Kim Introduction Suppose a stranger in downtown with a tour guide book?? Austin, TX 2 Introduction Look at guide What s this? Found Name of place Where
More informationFrom Structure-from-Motion Point Clouds to Fast Location Recognition
From Structure-from-Motion Point Clouds to Fast Location Recognition Arnold Irschara1;2, Christopher Zach2, Jan-Michael Frahm2, Horst Bischof1 1Graz University of Technology firschara, bischofg@icg.tugraz.at
More informationInstance-level recognition
Instance-level recognition 1) Local invariant features 2) Matching and recognition with local features 3) Efficient visual search 4) Very large scale indexing Matching of descriptors Matching and 3D reconstruction
More informationInstance-level recognition
Instance-level recognition 1) Local invariant features 2) Matching and recognition with local features 3) Efficient visual search 4) Very large scale indexing Matching of descriptors Matching and 3D reconstruction
More informationImage correspondences and structure from motion
Image correspondences and structure from motion http://graphics.cs.cmu.edu/courses/15-463 15-463, 15-663, 15-862 Computational Photography Fall 2017, Lecture 20 Course announcements Homework 5 posted.
More informationA memory efficient discriminative approach for location aided recognition
A memory efficient discriminative approach for location aided recognition Varsha Hedau 1, Sudipta N. Sinha 2, C. Lawrence Zitnick 2, and Richard Szeliski 2 1 Nokia Research, Sunnyvale, CA, USA varsha.hedau@gmail.com
More informationLocal features and image matching. Prof. Xin Yang HUST
Local features and image matching Prof. Xin Yang HUST Last time RANSAC for robust geometric transformation estimation Translation, Affine, Homography Image warping Given a 2D transformation T and a source
More informationA NEW FEATURE BASED IMAGE REGISTRATION ALGORITHM INTRODUCTION
A NEW FEATURE BASED IMAGE REGISTRATION ALGORITHM Karthik Krish Stuart Heinrich Wesley E. Snyder Halil Cakir Siamak Khorram North Carolina State University Raleigh, 27695 kkrish@ncsu.edu sbheinri@ncsu.edu
More information3D Point Cloud Reduction using Mixed-integer Quadratic Programming
213 IEEE Conference on Computer Vision and Pattern Recognition Workshops 3D Point Cloud Reduction using Mixed-integer Quadratic Programming Hyun Soo Park Canegie Mellon University hyunsoop@cs.cmu.edu James
More informationLOCATION information of an image is important for. 6-DOF Image Localization from Massive Geo-tagged Reference Images
1 6-DOF Image Localization from Massive Geo-tagged Reference Images Yafei Song, Xiaowu Chen, Senior Member, IEEE, Xiaogang Wang, Yu Zhang and Jia Li, Senior Member, IEEE Abstract The 6-DOF (Degrees Of
More informationLarge Scale Image Retrieval
Large Scale Image Retrieval Ondřej Chum and Jiří Matas Center for Machine Perception Czech Technical University in Prague Features Affine invariant features Efficient descriptors Corresponding regions
More informationIMAGE RETRIEVAL USING VLAD WITH MULTIPLE FEATURES
IMAGE RETRIEVAL USING VLAD WITH MULTIPLE FEATURES Pin-Syuan Huang, Jing-Yi Tsai, Yu-Fang Wang, and Chun-Yi Tsai Department of Computer Science and Information Engineering, National Taitung University,
More informationBundling Features for Large Scale Partial-Duplicate Web Image Search
Bundling Features for Large Scale Partial-Duplicate Web Image Search Zhong Wu, Qifa Ke, Michael Isard, and Jian Sun Microsoft Research Abstract In state-of-the-art image retrieval systems, an image is
More informationLecture 12 Recognition
Institute of Informatics Institute of Neuroinformatics Lecture 12 Recognition Davide Scaramuzza http://rpg.ifi.uzh.ch/ 1 Lab exercise today replaced by Deep Learning Tutorial by Antonio Loquercio Room
More informationarxiv: v1 [cs.cv] 28 Sep 2018
Camera Pose Estimation from Sequence of Calibrated Images arxiv:1809.11066v1 [cs.cv] 28 Sep 2018 Jacek Komorowski 1 and Przemyslaw Rokita 2 1 Maria Curie-Sklodowska University, Institute of Computer Science,
More informationPatch Descriptors. CSE 455 Linda Shapiro
Patch Descriptors CSE 455 Linda Shapiro How can we find corresponding points? How can we find correspondences? How do we describe an image patch? How do we describe an image patch? Patches with similar
More informationCamera Pose Voting for Large-Scale Image-Based Localization
Camera Pose Voting for Large-Scale Image-Based Localization Bernhard Zeisl Torsten Sattler Marc Pollefeys Department of Computer Science, ETH Zurich, Switzerland {bernhard.zeisl,torsten.sattler,marc.pollefeys}@inf.ethz.ch
More informationPublished in: MM '10: proceedings of the ACM Multimedia 2010 International Conference: October 25-29, 2010, Firenze, Italy
UvA-DARE (Digital Academic Repository) Landmark Image Retrieval Using Visual Synonyms Gavves, E.; Snoek, C.G.M. Published in: MM '10: proceedings of the ACM Multimedia 2010 International Conference: October
More informationSpecular 3D Object Tracking by View Generative Learning
Specular 3D Object Tracking by View Generative Learning Yukiko Shinozuka, Francois de Sorbier and Hideo Saito Keio University 3-14-1 Hiyoshi, Kohoku-ku 223-8522 Yokohama, Japan shinozuka@hvrl.ics.keio.ac.jp
More informationKeypoint-based Recognition and Object Search
03/08/11 Keypoint-based Recognition and Object Search Computer Vision CS 543 / ECE 549 University of Illinois Derek Hoiem Notices I m having trouble connecting to the web server, so can t post lecture
More informationPatch Descriptors. EE/CSE 576 Linda Shapiro
Patch Descriptors EE/CSE 576 Linda Shapiro 1 How can we find corresponding points? How can we find correspondences? How do we describe an image patch? How do we describe an image patch? Patches with similar
More informationImproved Coding for Image Feature Location Information
Improved Coding for Image Feature Location Information Sam S. Tsai, David Chen, Gabriel Takacs, Vijay Chandrasekhar Mina Makar, Radek Grzeszczuk, and Bernd Girod Department of Electrical Engineering, Stanford
More informationarxiv: v2 [cs.cv] 20 May 2017
Cluster-Wise Ratio Tests for Fast Camera Localization Raúl Díaz, Charless C. Fowlkes Computer Science Department, University of California, Irvine {rdiazgar,fowlkes}@uci.edu arxiv:1612.01689v2 [cs.cv]
More informationRobot localization method based on visual features and their geometric relationship
, pp.46-50 http://dx.doi.org/10.14257/astl.2015.85.11 Robot localization method based on visual features and their geometric relationship Sangyun Lee 1, Changkyung Eem 2, and Hyunki Hong 3 1 Department
More informationImprovement of SURF Feature Image Registration Algorithm Based on Cluster Analysis
Sensors & Transducers 2014 by IFSA Publishing, S. L. http://www.sensorsportal.com Improvement of SURF Feature Image Registration Algorithm Based on Cluster Analysis 1 Xulin LONG, 1,* Qiang CHEN, 2 Xiaoya
More informationEnsemble of Bayesian Filters for Loop Closure Detection
Ensemble of Bayesian Filters for Loop Closure Detection Mohammad Omar Salameh, Azizi Abdullah, Shahnorbanun Sahran Pattern Recognition Research Group Center for Artificial Intelligence Faculty of Information
More informationBinary SIFT: Towards Efficient Feature Matching Verification for Image Search
Binary SIFT: Towards Efficient Feature Matching Verification for Image Search Wengang Zhou 1, Houqiang Li 2, Meng Wang 3, Yijuan Lu 4, Qi Tian 1 Dept. of Computer Science, University of Texas at San Antonio
More informationLarge-scale visual recognition Efficient matching
Large-scale visual recognition Efficient matching Florent Perronnin, XRCE Hervé Jégou, INRIA CVPR tutorial June 16, 2012 Outline!! Preliminary!! Locality Sensitive Hashing: the two modes!! Hashing!! Embedding!!
More information3D Environment Reconstruction
3D Environment Reconstruction Using Modified Color ICP Algorithm by Fusion of a Camera and a 3D Laser Range Finder The 2009 IEEE/RSJ International Conference on Intelligent Robots and Systems October 11-15,
More informationTowards Visual Words to Words
Towards Visual Words to Words Text Detection with a General Bag of Words Representation Rakesh Mehta Dept. of Signal Processing, Tampere Univ. of Technology in Tampere Ondřej Chum, Jiří Matas Centre for
More informationLocal features: detection and description May 12 th, 2015
Local features: detection and description May 12 th, 2015 Yong Jae Lee UC Davis Announcements PS1 grades up on SmartSite PS1 stats: Mean: 83.26 Standard Dev: 28.51 PS2 deadline extended to Saturday, 11:59
More informationLarge scale object/scene recognition
Large scale object/scene recognition Image dataset: > 1 million images query Image search system ranked image list Each image described by approximately 2000 descriptors 2 10 9 descriptors to index! Database
More informationHamming embedding and weak geometric consistency for large scale image search
Hamming embedding and weak geometric consistency for large scale image search Herve Jegou, Matthijs Douze, and Cordelia Schmid INRIA Grenoble, LEAR, LJK firstname.lastname@inria.fr Abstract. This paper
More informationA Scalable Collaborative Online System for City Reconstruction
A Scalable Collaborative Online System for City Reconstruction Ole Untzelmann, Torsten Sattler, Sven Middelberg and Leif Kobbelt RWTH Aachen University ole.untzelmann@rwth-aachen.de, {tsattler, middelberg,
More informationK-Means Based Matching Algorithm for Multi-Resolution Feature Descriptors
K-Means Based Matching Algorithm for Multi-Resolution Feature Descriptors Shao-Tzu Huang, Chen-Chien Hsu, Wei-Yen Wang International Science Index, Electrical and Computer Engineering waset.org/publication/0007607
More informationLocal features: detection and description. Local invariant features
Local features: detection and description Local invariant features Detection of interest points Harris corner detection Scale invariant blob detection: LoG Description of local patches SIFT : Histograms
More informationarxiv: v1 [cs.cv] 28 Sep 2018
Extrinsic camera calibration method and its performance evaluation Jacek Komorowski 1 and Przemyslaw Rokita 2 arxiv:1809.11073v1 [cs.cv] 28 Sep 2018 1 Maria Curie Sklodowska University Lublin, Poland jacek.komorowski@gmail.com
More informationAutomatic Image Alignment (feature-based)
Automatic Image Alignment (feature-based) Mike Nese with a lot of slides stolen from Steve Seitz and Rick Szeliski 15-463: Computational Photography Alexei Efros, CMU, Fall 2006 Today s lecture Feature
More informationAn Evaluation of Two Automatic Landmark Building Discovery Algorithms for City Reconstruction
An Evaluation of Two Automatic Landmark Building Discovery Algorithms for City Reconstruction Tobias Weyand, Jan Hosang, and Bastian Leibe UMIC Research Centre, RWTH Aachen University {weyand,hosang,leibe}@umic.rwth-aachen.de
More informationImage Retrieval with a Visual Thesaurus
2010 Digital Image Computing: Techniques and Applications Image Retrieval with a Visual Thesaurus Yanzhi Chen, Anthony Dick and Anton van den Hengel School of Computer Science The University of Adelaide
More informationVisual localization using global visual features and vanishing points
Visual localization using global visual features and vanishing points Olivier Saurer, Friedrich Fraundorfer, and Marc Pollefeys Computer Vision and Geometry Group, ETH Zürich, Switzerland {saurero,fraundorfer,marc.pollefeys}@inf.ethz.ch
More informationPreviously. Part-based and local feature models for generic object recognition. Bag-of-words model 4/20/2011
Previously Part-based and local feature models for generic object recognition Wed, April 20 UT-Austin Discriminative classifiers Boosting Nearest neighbors Support vector machines Useful for object recognition
More informationFROM COARSE TO FINE: QUICKLY AND ACCURATELY OBTAINING INDOOR IMAGE-BASED LOCALIZATION UNDER VARIOUS ILLUMINATIONS. Guoyu Lu
FROM COARSE TO FINE: QUICKLY AND ACCURATELY OBTAINING INDOOR IMAGE-BASED LOCALIZATION UNDER VARIOUS ILLUMINATIONS by Guoyu Lu A dissertation submitted to the Faculty of the University of Delaware in partial
More informationEvaluation of GIST descriptors for web scale image search
Evaluation of GIST descriptors for web scale image search Matthijs Douze Hervé Jégou, Harsimrat Sandhawalia, Laurent Amsaleg and Cordelia Schmid INRIA Grenoble, France July 9, 2009 Evaluation of GIST for
More informationLocal Image Features
Local Image Features Computer Vision Read Szeliski 4.1 James Hays Acknowledgment: Many slides from Derek Hoiem and Grauman&Leibe 2008 AAAI Tutorial Flashed Face Distortion 2nd Place in the 8th Annual Best
More informationThe SIFT (Scale Invariant Feature
The SIFT (Scale Invariant Feature Transform) Detector and Descriptor developed by David Lowe University of British Columbia Initial paper ICCV 1999 Newer journal paper IJCV 2004 Review: Matt Brown s Canonical
More informationRobotics Programming Laboratory
Chair of Software Engineering Robotics Programming Laboratory Bertrand Meyer Jiwon Shin Lecture 8: Robot Perception Perception http://pascallin.ecs.soton.ac.uk/challenges/voc/databases.html#caltech car
More informationFeature Based Registration - Image Alignment
Feature Based Registration - Image Alignment Image Registration Image registration is the process of estimating an optimal transformation between two or more images. Many slides from Alexei Efros http://graphics.cs.cmu.edu/courses/15-463/2007_fall/463.html
More informationLocation Recognition using Prioritized Feature Matching
Location Recognition using Prioritized Feature Matching Yunpeng Li Noah Snavely Daniel P. Huttenlocher Department of Computer Science, Cornell University, Ithaca, NY 14853 {yuli,snavely,dph}@cs.cornell.edu
More informationHandling Urban Location Recognition as a 2D Homothetic Problem
Handling Urban Location Recognition as a 2D Homothetic Problem Georges Baatz 1, Kevin Köser 1, David Chen 2, Radek Grzeszczuk 3, and Marc Pollefeys 1 1 Department of Computer Science, ETH Zurich, Switzerland
More informationHomographies and RANSAC
Homographies and RANSAC Computer vision 6.869 Bill Freeman and Antonio Torralba March 30, 2011 Homographies and RANSAC Homographies RANSAC Building panoramas Phototourism 2 Depth-based ambiguity of position
More informationMultiple-Choice Questionnaire Group C
Family name: Vision and Machine-Learning Given name: 1/28/2011 Multiple-Choice naire Group C No documents authorized. There can be several right answers to a question. Marking-scheme: 2 points if all right
More informationMidterm Wed. Local features: detection and description. Today. Last time. Local features: main components. Goal: interest operator repeatability
Midterm Wed. Local features: detection and description Monday March 7 Prof. UT Austin Covers material up until 3/1 Solutions to practice eam handed out today Bring a 8.5 11 sheet of notes if you want Review
More informationLecture 12 Recognition
Institute of Informatics Institute of Neuroinformatics Lecture 12 Recognition Davide Scaramuzza 1 Lab exercise today replaced by Deep Learning Tutorial Room ETH HG E 1.1 from 13:15 to 15:00 Optional lab
More informationMethods for Representing and Recognizing 3D objects
Methods for Representing and Recognizing 3D objects part 1 Silvio Savarese University of Michigan at Ann Arbor Object: Building, 45º pose, 8-10 meters away Object: Person, back; 1-2 meters away Object:
More informationPhoto Tourism: Exploring Photo Collections in 3D
Photo Tourism: Exploring Photo Collections in 3D SIGGRAPH 2006 Noah Snavely Steven M. Seitz University of Washington Richard Szeliski Microsoft Research 2006 2006 Noah Snavely Noah Snavely Reproduced with
More informationSIFT: SCALE INVARIANT FEATURE TRANSFORM SURF: SPEEDED UP ROBUST FEATURES BASHAR ALSADIK EOS DEPT. TOPMAP M13 3D GEOINFORMATION FROM IMAGES 2014
SIFT: SCALE INVARIANT FEATURE TRANSFORM SURF: SPEEDED UP ROBUST FEATURES BASHAR ALSADIK EOS DEPT. TOPMAP M13 3D GEOINFORMATION FROM IMAGES 2014 SIFT SIFT: Scale Invariant Feature Transform; transform image
More informationarxiv: v1 [cs.cv] 26 Dec 2013
Finding More Relevance: Propagating Similarity on Markov Random Field for Image Retrieval Peng Lu a, Xujun Peng b, Xinshan Zhu c, Xiaojie Wang a arxiv:1312.7085v1 [cs.cv] 26 Dec 2013 a Beijing University
More informationEnhanced and Efficient Image Retrieval via Saliency Feature and Visual Attention
Enhanced and Efficient Image Retrieval via Saliency Feature and Visual Attention Anand K. Hase, Baisa L. Gunjal Abstract In the real world applications such as landmark search, copy protection, fake image
More informationFuzzy based Multiple Dictionary Bag of Words for Image Classification
Available online at www.sciencedirect.com Procedia Engineering 38 (2012 ) 2196 2206 International Conference on Modeling Optimisation and Computing Fuzzy based Multiple Dictionary Bag of Words for Image
More informationPerception IV: Place Recognition, Line Extraction
Perception IV: Place Recognition, Line Extraction Davide Scaramuzza University of Zurich Margarita Chli, Paul Furgale, Marco Hutter, Roland Siegwart 1 Outline of Today s lecture Place recognition using
More informationViewpoint Invariant Features from Single Images Using 3D Geometry
Viewpoint Invariant Features from Single Images Using 3D Geometry Yanpeng Cao and John McDonald Department of Computer Science National University of Ireland, Maynooth, Ireland {y.cao,johnmcd}@cs.nuim.ie
More informationSUMMARY: DISTINCTIVE IMAGE FEATURES FROM SCALE- INVARIANT KEYPOINTS
SUMMARY: DISTINCTIVE IMAGE FEATURES FROM SCALE- INVARIANT KEYPOINTS Cognitive Robotics Original: David G. Lowe, 004 Summary: Coen van Leeuwen, s1460919 Abstract: This article presents a method to extract
More informationLocal Image Features
Local Image Features Computer Vision CS 143, Brown Read Szeliski 4.1 James Hays Acknowledgment: Many slides from Derek Hoiem and Grauman&Leibe 2008 AAAI Tutorial This section: correspondence and alignment
More informationBeyond bags of features: Adding spatial information. Many slides adapted from Fei-Fei Li, Rob Fergus, and Antonio Torralba
Beyond bags of features: Adding spatial information Many slides adapted from Fei-Fei Li, Rob Fergus, and Antonio Torralba Adding spatial information Forming vocabularies from pairs of nearby features doublets
More informationDeterminant of homography-matrix-based multiple-object recognition
Determinant of homography-matrix-based multiple-object recognition 1 Nagachetan Bangalore, Madhu Kiran, Anil Suryaprakash Visio Ingenii Limited F2-F3 Maxet House Liverpool Road Luton, LU1 1RS United Kingdom
More informationPart-based and local feature models for generic object recognition
Part-based and local feature models for generic object recognition May 28 th, 2015 Yong Jae Lee UC Davis Announcements PS2 grades up on SmartSite PS2 stats: Mean: 80.15 Standard Dev: 22.77 Vote on piazza
More informationStructuring a Sharded Image Retrieval Database
Structuring a Sharded Image Retrieval Database Eric Liang and Avideh Zakhor Department of Electrical Engineering and Computer Science, University of California, Berkeley {ekhliang, avz}@eecs.berkeley.edu
More informationA Novel Method for Image Retrieving System With The Technique of ROI & SIFT
A Novel Method for Image Retrieving System With The Technique of ROI & SIFT Mrs. Dipti U.Chavan¹, Prof. P.B.Ghewari² P.G. Student, Department of Electronics & Tele. Comm. Engineering, Ashokrao Mane Group
More informationIntroduction. Introduction. Related Research. SIFT method. SIFT method. Distinctive Image Features from Scale-Invariant. Scale.
Distinctive Image Features from Scale-Invariant Keypoints David G. Lowe presented by, Sudheendra Invariance Intensity Scale Rotation Affine View point Introduction Introduction SIFT (Scale Invariant Feature
More informationVideo Google: A Text Retrieval Approach to Object Matching in Videos
Video Google: A Text Retrieval Approach to Object Matching in Videos Josef Sivic, Frederik Schaffalitzky, Andrew Zisserman Visual Geometry Group University of Oxford The vision Enable video, e.g. a feature
More informationPlace Recognition and Online Learning in Dynamic Scenes with Spatio-Temporal Landmarks
LEARNING IN DYNAMIC SCENES WITH SPATIO-TEMPORAL LANDMARKS 1 Place Recognition and Online Learning in Dynamic Scenes with Spatio-Temporal Landmarks Edward Johns and Guang-Zhong Yang ej09@imperial.ac.uk
More informationLearning a Fine Vocabulary
Learning a Fine Vocabulary Andrej Mikulík, Michal Perdoch, Ondřej Chum, and Jiří Matas CMP, Dept. of Cybernetics, Faculty of EE, Czech Technical University in Prague Abstract. We present a novel similarity
More informationEE368 Project Report CD Cover Recognition Using Modified SIFT Algorithm
EE368 Project Report CD Cover Recognition Using Modified SIFT Algorithm Group 1: Mina A. Makar Stanford University mamakar@stanford.edu Abstract In this report, we investigate the application of the Scale-Invariant
More informationEfficient Representation of Local Geometry for Large Scale Object Retrieval
Efficient Representation of Local Geometry for Large Scale Object Retrieval Michal Perďoch Ondřej Chum and Jiří Matas Center for Machine Perception Czech Technical University in Prague IEEE Computer Society
More informationEFFECTIVE FISHER VECTOR AGGREGATION FOR 3D OBJECT RETRIEVAL
EFFECTIVE FISHER VECTOR AGGREGATION FOR 3D OBJECT RETRIEVAL Jean-Baptiste Boin, André Araujo, Lamberto Ballan, Bernd Girod Department of Electrical Engineering, Stanford University, CA Media Integration
More informationLecture 24: Image Retrieval: Part II. Visual Computing Systems CMU , Fall 2013
Lecture 24: Image Retrieval: Part II Visual Computing Systems Review: K-D tree Spatial partitioning hierarchy K = dimensionality of space (below: K = 2) 3 2 1 3 3 4 2 Counts of points in leaf nodes Nearest
More informationObject Recognition and Augmented Reality
11/02/17 Object Recognition and Augmented Reality Dali, Swans Reflecting Elephants Computational Photography Derek Hoiem, University of Illinois Last class: Image Stitching 1. Detect keypoints 2. Match
More informationEfficient Re-ranking in Vocabulary Tree based Image Retrieval
Efficient Re-ranking in Vocabulary Tree based Image Retrieval Xiaoyu Wang 2, Ming Yang 1, Kai Yu 1 1 NEC Laboratories America, Inc. 2 Dept. of ECE, Univ. of Missouri Cupertino, CA 95014 Columbia, MO 65211
More informationCS 4495 Computer Vision A. Bobick. CS 4495 Computer Vision. Features 2 SIFT descriptor. Aaron Bobick School of Interactive Computing
CS 4495 Computer Vision Features 2 SIFT descriptor Aaron Bobick School of Interactive Computing Administrivia PS 3: Out due Oct 6 th. Features recap: Goal is to find corresponding locations in two images.
More informationRegion Graphs for Organizing Image Collections
Region Graphs for Organizing Image Collections Alexander Ladikos 1, Edmond Boyer 2, Nassir Navab 1, and Slobodan Ilic 1 1 Chair for Computer Aided Medical Procedures, Technische Universität München 2 Perception
More informationCS 4495 Computer Vision Classification 3: Bag of Words. Aaron Bobick School of Interactive Computing
CS 4495 Computer Vision Classification 3: Bag of Words Aaron Bobick School of Interactive Computing Administrivia PS 6 is out. Due Tues Nov 25th, 11:55pm. One more assignment after that Mea culpa This
More informationTRECVid 2013 Experiments at Dublin City University
TRECVid 2013 Experiments at Dublin City University Zhenxing Zhang, Rami Albatal, Cathal Gurrin, and Alan F. Smeaton INSIGHT Centre for Data Analytics Dublin City University Glasnevin, Dublin 9, Ireland
More informationLocal invariant features
Local invariant features Tuesday, Oct 28 Kristen Grauman UT-Austin Today Some more Pset 2 results Pset 2 returned, pick up solutions Pset 3 is posted, due 11/11 Local invariant features Detection of interest
More informationMotion Estimation and Optical Flow Tracking
Image Matching Image Retrieval Object Recognition Motion Estimation and Optical Flow Tracking Example: Mosiacing (Panorama) M. Brown and D. G. Lowe. Recognising Panoramas. ICCV 2003 Example 3D Reconstruction
More informationGeneralized RANSAC framework for relaxed correspondence problems
Generalized RANSAC framework for relaxed correspondence problems Wei Zhang and Jana Košecká Department of Computer Science George Mason University Fairfax, VA 22030 {wzhang2,kosecka}@cs.gmu.edu Abstract
More informationA Rapid Automatic Image Registration Method Based on Improved SIFT
Available online at www.sciencedirect.com Procedia Environmental Sciences 11 (2011) 85 91 A Rapid Automatic Image Registration Method Based on Improved SIFT Zhu Hongbo, Xu Xuejun, Wang Jing, Chen Xuesong,
More informationOnline Learning of Binary Feature Indexing for Real-time SLAM Relocalization
Online Learning of Binary Feature Indexing for Real-time SLAM Relocalization Youji Feng 1, Yihong Wu 1, Lixin Fan 2 1 Institute of Automation, Chinese Academy of Sciences 2 Nokia Research Center, Tampere
More informationLight-Weight Spatial Distribution Embedding of Adjacent Features for Image Search
Light-Weight Spatial Distribution Embedding of Adjacent Features for Image Search Yan Zhang 1,2, Yao Zhao 1,2, Shikui Wei 3( ), and Zhenfeng Zhu 1,2 1 Institute of Information Science, Beijing Jiaotong
More informationLearning Affine Robust Binary Codes Based on Locality Preserving Hash
Learning Affine Robust Binary Codes Based on Locality Preserving Hash Wei Zhang 1,2, Ke Gao 1, Dongming Zhang 1, and Jintao Li 1 1 Advanced Computing Research Laboratory, Beijing Key Laboratory of Mobile
More informationSpatial Coding for Large Scale Partial-Duplicate Web Image Search
Spatial Coding for Large Scale Partial-Duplicate Web Image Search Wengang Zhou, Yijuan Lu 2, Houqiang Li, Yibing Song, Qi Tian 3 Dept. of EEIS, University of Science and Technology of China, Hefei, P.R.
More informationCompressed local descriptors for fast image and video search in large databases
Compressed local descriptors for fast image and video search in large databases Matthijs Douze2 joint work with Hervé Jégou1, Cordelia Schmid2 and Patrick Pérez3 1: INRIA Rennes, TEXMEX team, France 2:
More informationLocal Features: Detection, Description & Matching
Local Features: Detection, Description & Matching Lecture 08 Computer Vision Material Citations Dr George Stockman Professor Emeritus, Michigan State University Dr David Lowe Professor, University of British
More information