A Fast and Accurate Feature-Matching Algorithm for Minimally-Invasive Endoscopic Images

Size: px
Start display at page:

Download "A Fast and Accurate Feature-Matching Algorithm for Minimally-Invasive Endoscopic Images"

Transcription

1 IEEE TRANSACTIONS ON MEDICAL IMAGING, VOL. 32, NO. 7, JULY A Fast and Accurate Feature-Matching Algorithm for Minimally-Invasive Endoscopic Images Gustavo A. Puerto-Souza* and Gian-Luca Mariottini Abstract The ability to find image similarities between two distinct endoscopic views is known as feature matching, andisessential in many robotic-assisted minimally-invasive surgery (MIS) applications. Differently from feature-tracking methods, feature matching does not make any restrictive assumption about the chronological order between the two images or about the organ motion, but first obtains a set of appearance-based image matches, and subsequently removes possible outliers based on geometric constraints. As a consequence, feature-matching algorithms can be used to recover the position of any image feature after unexpected camera events, such as complete occlusions, sudden endoscopic-camera retraction, or strong illumination changes. We introduce the hierarchical multi-affine (HMA) algorithm, which improves over existing feature-matching methods because of the larger number of image correspondences, the increased speed, and the higher accuracy and robustness. We tested HMA over a large (and annotated) dataset with more than 100 MIS image pairs obtained from real interventions, and containing many of the aforementioned sudden events. In all of these cases, HMA outperforms the existing state-of-the-art methods in terms of speed, accuracy, and robustness. In addition, HMA and the image databasearemadefreely available on the internet. Index Terms Abdomen, endoscopic image analysis, endoscopy, feature matching, robust estimation. I. INTRODUCTION I N robotic-assisted minimally-invasive surgery (MIS), the ability to find image similarities between (at least two) laparoscopic views of the same scene is crucial in many applications, such as shape recovery [1] [3], camera calibration [4], structure and camera-motion estimation [5], [6], or augmented reality (AR) [7] [10]. Thus far, this similarity-search problem has been addressed either by means of recursive (feature tracking) strategies or by means of feature matching (or tracking by detection ) [11], [12]. Feature-tracking algorithms require that the features are extracted from sequential frames, and have been successfully applied to MIS in the case of small occlusions [4], [6], Manuscript received November 04, 2012; revised January 02, 2013; accepted January 02, Date of publication June 26, Asterisk indicate corresponding author. *G. A. Puerto-Souza is with the Department of Computer Science and Engineering, University of Texas at Arlington, Arlington, TX USA ( gustavo.puerto@mavs.uta.edu). G. L. Mariottini is with the Department of Computer Science and Engineering, University of Texas at Arlington, Arlington, TX USA ( gianluca@uta.edu). Color versions of one or more of the figures in this paper are available online at Digital Object Identifier /TMI [13] [15]. Feature-matching methods do not make any restrictive assumption about the chronological order of the frames to be processed, or about the scene geometry (e.g., known organ motion due to breathing). Because of these characteristics, feature-matching algorithms are of utmost importance to automatically recover those tracked features that were lost after large and sudden camera motions, complete occlusions, or strong organ deformations. In general, feature-matching algorithms initially find a set of potential matches by leveraging the appearance around distinctive image features (e.g., SIFT [16]), extracted from the image before and after the sudden camera event. Subsequently, possible ambiguities in these appearance-based matches are removed by enforcing a geometric constraint, i.e., by estimating an image mapping (e.g., rotation, translation, scale, and shear) between the two candidate feature sets. This mapping is of particular importance, since it can be used to predict the position of the lost tracked features in the image after the event. Because of these appealing characteristics, feature matching is of fundamental importance in many applications, such as structure from motion [5], [6], [17], registration [1] [3], and localization. We are here particularly interested in augmented-reality (AR) applications [7], [18] [21], which aim at increasing the surgeon s visual awareness of anatomical targets by precisely overlaying preoperative radiological data onto live laparoscopic videos. Providing a reliable and accurate feature matching is fundamental in AR to recover the position of the lost anchor image points (e.g., after a complete and prolonged occlusion). In this way, feature matching can be used to automatically re-initialize the lost augmented display (cf. the example in Fig. 1) and guarantee long-term augmentations. Recently, some feature-matching strategies have been presented [16], [22], [23] that try to be robust in the presence of object deformations. However, these algorithms are still computationally cumbersome, cannot retrieve a large-enough set of accurate matches and, to the best of our knowledge, have never been evaluated in MIS scenarios. MIS images are challenging due to the presence of frequent occlusions, object deformations, and image clutter (smoke, blood, and reflections). Addressing these shortcomings is then of primary need for the medical-imaging community. The original contribution of this work consists of the design of a novel feature-matching algorithm that improves over the existing methods by finding a larger number of image correspondences at an increased speed, and with both a higher accuracy and robustness to image clutter. Our method, called hierarchical multi-affine (HMA), hierarchically clusters the initial set of appearance-based matches into spatially-distributed clusters /$ IEEE

2 1202 IEEE TRANSACTIONS ON MEDICAL IMAGING, VOL. 32, NO. 7, JULY 2013 Fig. 1. Application of feature matching for augmented-reality recovery: AR systems make use of anchor points (i.e., associations between the 2-D laparoscopic view and the 3-D CT model) to mantain an augmented view (frame before occlusion). However, occlusions can cause the loss of anchor points (frame occlusion), and then, the loss of the augmented view (frame after occlusion). HMA can be used to automatically recover the lost anchor points and thus, the augmented view (see frame recovered augmentation). over the organ s surface. For each of these clusters, an affine transformation is then estimated to further prune the incorrect initial matches, and to capture features spread over the entire object s surface. Each of these clusters of features is estimated according to a local geometric (affine) transformation, which maps image features from the image before the occlusion to the one after the occlusion. Due to the sensitive nature of the MIS scenario, feature-matching methods have to comply with safety requirements to be approved for use in the operating room. High accuracy is required in order to precisely recover a high number of features. Moreover, real time and robust performance are crucial for any in vivo MIS application. For these reasons, HMA has been extensively evaluated with respect to accuracy, robustness and time, over two large and varied in-lab and MIS images datasets (the latter includes images from six 3-h-long surgical interventions). In this work, we decided to focus on the comparison between three most-popular features, SIFT [16], ASIFT [24], and SURF [25]. While other sparse and dense features have been proposed in the past years [26] [29], the aforementioned ones represent a good compromise between invariance to a large number of affine parameters, and can be extracted almost in real time. In all the aforementioned evaluation scenarios, we show that HMA outperforms the existing state-of-the-art methods in terms of speed, inlier detection rate, and accuracy. A. Related Work The first step in feature matching consists of extracting salient features from the two images of the same scene. Several algorithms have been proposed to detect features (SIFT, SURF, ASIFT), each one with different invariant properties. After this feature-extraction part, a feature-matching step follows, which in general consists of two phases; first, an appearance-based matching phase [30], in which the local appearance of each feature is used to determine a set of candidate (or initial) matches. In the second step, these initial matches are pruned from ambiguities in the appearance by means of an additional geometric-validation phase, which leverages the spatial arrangement of features. In the past years, several approaches have been proposed to address this second geometric-validation step. For example, in [31], the authors use an image database to construct a 3-D model of the object of interest and to finally match the features of the query image and of the 3-D object. However this algorithm is computationally expensive, it assumes that the object is nondeformable, and it needs a large set of images to build the 3-D model. Other approaches have been proposed in [23], [32] [34] to model the feature-matching as a graph-matching problem. In general, the possible matches are considered as nodes, while their disagreement (or agreement) is measured by the weight of each edge, which is modelled by an energy function. In [32], geometric constraints are used to penalize those matches that change their relative length and orientation between the two images. However, these constraints are only suitable for rigid movements, and do not work under significant object deformations or viewpoint change. In [34], an energy function is used to penalize the occluded features, and the geometric constraints of [32] are relaxed to neighboring matches. Despite these improvements, these methods are not robust to viewpoint changes. In [33], geometric constraints are used to represent each feature position as an affine combination of its neighbors. Even if this algorithm is robust to affine viewpoint changes, it performs poorly with large occlusions and nonrigid deformations. The method proposed in [35] creates additional (randomly-distorted) training images, and estimates a single homography mapping together with the associated inliers. However, and because of the uncontrolled (random) generation of these images, only a limited set of correspondences can be detected. We focus here on some recent feature-matching algorithms that are more appropriate to deal with large camera movements, occlusions and deformations. In particular, the algorithm in [16] detects a predominant set of matches (inliers) that satisfy auniqueaffine-transformation [36]. While this method can reliably discard many wrong matches, only a limited number of those agreeing with this single transformation are kept. The above issue was solved in [22], where multiple local-affine transformations have been used to detect a larger number of matches, and uniformly distributed over the entire nonplanar object surface. However, due to its high computational time, thismethodcouldonlybeusedforoffline data processing. The work in [23] defines a dissimilarity measure between the matches based on both a geometrical and an appearance constraint. Finally, an agglomerative step generates clusters of matches by iteratively merging similar ones (according to the dissimilarity measure). A drawback of this algorithm is its high computational complexity for an increasing number of matches [37]. The HMA algorithm presented in this paper improves over the existing methods according to the following. HMA uses multiple affine transformations to accurately map features between the two images. Because of this, HMA can also detect a larger percentage of correct matches when compared with [16] and [23].

3 PUERTO-SOUZA AND MARIOTTINI: A FAST AND ACCURATE FEATURE-MATCHING ALGORITHM FOR MINIMALLY-INVASIVE ENDOSCOPIC IMAGES 1203 HMAisfast, because of the adoption of a hierarchical feature-clustering phase. In particular, when a large number of features are detected (e.g., when using high-definition images or extracting ASIFT features), HMA is almost oneorder-of-magnitude faster than [23] and [22]. HMA is robust to outliers, because it incorporates several robust techniques (e.g., RANSAC and nonparametric data analysis). HMA is an outgrowth of our recent conference paper [38] over which we ameliorated in several interesting directions. First, we improved the feature-matching performance, and reduced its computational time by adopting an initial Hough-voting phase. Second, we extended HMA s evaluation and comparison by including a matching-performance metric based on ROC analysis [39]. Finally, we evaluated HMA over a large and manually annotated dataset of in-lab and in vivo images. HMA and the image database are made available on the Internet for the entire community. 1 The paper is organized as follows. Section II introduces the feature matching problem, the basic notation, and the details of the HMA algorithm. Section III reports the results of our extensive experimental evaluation and the comparison of HMA with state-of-the-art feature matching methods. Finally, in Section IV we discuss our results. II. METHODS We introduce here the basics of feature matching, and highlight the need for more advanced methods when dealing with laparoscopic images. In Section II-D we finally introduce the hierarchical multi-affine (HMA) algorithm. A. The Feature-Matching Problem Consider a pair of images, (training, e.g., before occlusion) and (query, e.g., after occlusion), and two corresponding sets of image features, and,(e.g.,sift [16], SURF [25], ASIFT [24]), extracted from and, respectively. Each feature consists of a keypoint vector [16] that stores the geometric characteristics of the feature (such as its position), and of a descriptor vector, which captures the local appearance around the keypoint position. The main goal of a feature-matching algorithm (see the diagram in Fig. 2) is to retrieve pairs of similar features (correspondences) among the two images by leveraging both the appearance and the geometric information contained in all these features. Existing feature-matching algorithms consist of two phases: an appearance-based matching and a geometric-validation.the appearance-based matching uses the information encoded in the descriptor vectors to obtain a set of initial (candidate) matches [40], [41]. A popular method is nearest neighbor distance ratio (NNDR) [16], to match a query feature,, with a training feature,, in the case they have the closest distance between descriptor vectors, and if the ratio between the closest and the second-closest descriptor distances is less than a threshold. 1 Source code available on Fig. 2. Diagram of the HMA feature-matching algorithm: HMA is specifically used to remove incorrect initial matches. However, due to appearance similarity, this set of initial matches may contain a large number of incorrect matches. In order to prune the initial matches from these outliers, a geometric-validation phase is usually adopted, which models the geometric motion of one (or a group of) candidate matching features between the two images. As a result, those matches that agree with this geometric model are considered inliers (i.e., correct matches) otherwise they are discarded as outliers. In what follows, we assume a given set of initial matches (e.g., computed by using NNDR). We detail the geometric constraint phase (cf. Sections II-B and II-C ) and highlight potential problems of this phase when applied to laparoscopic images. This will lead to the design of the hierarchical multi-affine algorithm (cf. Section II-D). B. Imposing Geometric Constraints Geometric constraints can be used to model image transformations of one (or a group of) features from the training image to the query image. These geometric constraints are usually adopted to predict the feature mapping from to, and vice

4 1204 IEEE TRANSACTIONS ON MEDICAL IMAGING, VOL. 32, NO. 7, JULY 2013 Fig. 3. Left: SIFT keypoints detected in an endoscopic image: The position of each feature is represented by the center of the circle, the scale is proportional to the radius, while the orientations contain the directions of the most prominent local image gradients around the feature position. Right: SURF keypoints: Differently from SIFT, SURF keypoints tend to be less dense on textureless area. versa. Because of this, geometric constraints can be leveraged to detect incorrect matches that, differently from the majority of initial matches, do not agree with this mapping. Examples of such constraints are: similarity transformation (which models feature rotation, translation and scale), and affine transformation (which also includes shear). As detailed in the following Sections II-B1 and II-B2, these geometric transformations can be directly estimated from the keypoint vector, which usually contains the feature (pixel) position,, its scale,, and its orientation,.forexample, Fig. 3 shows SIFT and SURF keypoints detected in an endoscopic image. 1) Similarity Transformation: A similarity transformation maps a generic image point 2,to, according to the following model: where the parameters of the similarity transformation are:, which contains the change in scale, the 2-D rotation angle,, and the translation between keypoints. These similarity parameters can be estimated from each keypoint parameters, as follows:,,. 2) Affine Transformation: An affine transformation models the rotation, translation, scale and shear of image features. The affine transformation is modeled by the following [36]: The six affine-transformation parameters,, can be estimated from at least three (noncollinear) matches by first rewriting (2) into a linear form 3, an then by calculating a least-squares solution. C. Feature Matching by Imposing (Single) Affine Constraint Geometric constraints (e.g., affine transformation, )are often used to model the mapping for groups of initial matches. The quality of this mapping, and of each potential matching, is represented by the (pixel) symmetric-reprojection error 2 Note that this image point does not have to correspond to the keypoint location,. 3 Note that denote the extension to homogeneous coordinates of. (1) (2) (3) Fig. 4. Imposing a geometric (affine) constraint on feature matches. (a) Set of initial matches (better seen in color), the correct matches are shown in (green) solid lines, and the wrong matches in (yellow) dashed lines. (b) Subset of matches agreeing with the affine transformation,, with a threshold of 1.5 pixels. Note that this set only contains a few correct matches only localized in a portion of the image. Those matches that exhibit a reprojection error larger than athreshold are considered outliers (i.e., wrong matches). Usually, the estimation of both the transformation and of the inliers is performed by means of RANSAC (RANdom SAmpling and Consensus) [42]. In brief, RANSAC randomly selects a minimal number of matches to estimate an instance of the model (e.g., three matches to estimate an affine transformation) and checks how many other matches (the consensus)agreewith this minimal model. This random selection is iterated over many times until a final transformation is obtained that has a large consensus or when a maximum number of iterations is reached. An example of this single-affine robust estimation is illustrated in Fig. 4. The initial matches are shown in Fig. 4(a): for clarity of presentation, we indicated the correct matches as solid (green) lines, while the inaccurate matches with (yellow) dashed lines. Fig. 4(b) shows the correct matches obtained in our experiments as the result of an affine transformation estimated with RANSAC. Note that the refined matches in Fig. 4(b) do not contain outliers and are correctly mapped to by the estimated transformation. However, we have observed an important drawback of this method, namely its capacity to recover only a limited number of matching features lying on an (almost planar) portion of the organ surface (cf. the polygon in Fig. 4). This happens because the affine constraint models the feature motion only as a rigid motion plus shear, and it is then more appropriate when observing planar surfaces. This phenomenon was never reported before in the literature, and motivated our team to search for a better solution to the feature-matching problem in the general case of a nonplanar (e.g., organ s) surface. D. Hierarchical Multi-Affine (HMA) Algorithm HMA improves over the aforementioned limitations by estimating a set of multiple and spatially-distributed affine transformations. As illustrated in Fig. 5, each transformation

5 PUERTO-SOUZA AND MARIOTTINI: A FAST AND ACCURATE FEATURE-MATCHING ALGORITHM FOR MINIMALLY-INVASIVE ENDOSCOPIC IMAGES 1205 Fig. 5. From a set of initial (appearance-based) matches, HMA retrieves a set of refined (or final) matches, and estimates a set of local affine transformations (betterseenincolor). Fig. 6. Illustrative example of the hierarchical clustering of the matches: The initial matches (the correct matches are represented by solid lines while the wrong ones with dotted lines) are iteratively divided into smaller clusters (represented by polygons) each one limited to a portion of the image. Note this clustering generates clusters spatially distributed over all the scene. maps any image feature from the training image to its corresponding region on the query image. In estimating these transformations, HMA simultaneously computes the set of final matches (inliers) that support these local affine transformations. As we would expect, multiple affine transformations locally adapt more precisely to the object surface than a single transformation. As an effect, they can 1) retrieve a larger number of correct matches, and 2) can estimate a set of highly-accurate image transformations. We present here a general overview of the HMA algorithm. Each stage will be further detailed in Section II-E. From a given set of candidate matches, HMA clusters the associated keypoints into contiguous areas, spatially distributed over the entire organ s surface. As illustrated in Fig. 6, this clustering is hierarchical, and each cluster of matches is represented by a tree node, while the edges represent the expansion of a cluster (node) into sub-clusters (children nodes). The root node of the tree contains all the appearance-based matches, as indicated by the (black) polygon in the root node of Fig. 6. These matches are clustered into disjoint portions, and geometric constraints are enforced in each cluster to remove outliers (see the colored polygons in the inner node). Note that the first node to the left ( inner node ) still contains some outliers, thus requiring further expansion, while the right node already reached a terminal state ( leaf node ) since it has a minimum number of supporting matches, and it does not have any wrong matches. Thanks to this clustering formulation HMA provides several advantages with respect to the other existing techniques. Fig. 7. HMA algorithm: The initial matches in the root node are passed to an initial clustering, which detects and removes potential outliers. The remaining matches are passed to an expansion phase which consists of three steps: clustering, affine estimation and correction. As a result the root node is divided into new k-nodes. These new nodes are subject to a stop criterion to determine if each node requires further expansion or not. This expansion phase is repeated until each node reaches a leaf state. Accuracy: The use of multiple affine transformations is effective to remove a high percentage of incorrect matches on different areas of the organ s surface. Speed: The expansion of each node allows to separately process each new branch of the tree. This permits the user to eventually stop the expansion of each branch when a desired accuracy (or tree depth) is achieved, thus reducing the computational time. Note that the hierarchical structure of HMA lends itself to be further accelerated with a parallel implementation on a multicore machine by processing each sub-tree on a different processor. Robustness: Each feature in the training image will be associated (i.e., matched) to only one feature in the query image. This is done by guaranteeing that each cluster (and thus, each affine transformation) is populated by contiguous and nonoverlapping sets of matches. Furthermore, each local affine transformation is estimated by means of RANSAC, which makes HMA robust to outliers. E. HMA: Block Diagram and Phases We detail here each phase of the HMA algorithm, illustrated in the block diagram of Fig. 7. The appearance-based matches are passed to a Hough-voting phase that is used to find clusters of features with similar keypoint parameters. In doing so, this phase can discard those

6 1206 IEEE TRANSACTIONS ON MEDICAL IMAGING, VOL. 32, NO. 7, JULY 2013 matches that have very low votes. These outliers are sent to an outlier buffer. The resulting inliers are instead assigned to the root node of the tree. This node is passed into a node-expansion phase,which consists of three steps. 1) A clustering step, which partitions these matching keypoints into clusters (each one containing matches with similar keypoint parameters). 2) For each cluster, a robust affine-estimation step is used to estimate an affine transformation, as well as the corresponding set of inliers and outliers. At this level, the set of outstanding outliers will be referred to as hard outliers; they will be removed from that node and sent to. 3) Finally, a correction step is used to a) verify that the clusters are spatially disjoint and, b) if necessary, to reassign the matches from one cluster to another in order to ensure disjoint clusters and, finally, c) to update the sets of inliers and outliers. After these stages, each affine transformation, together with their corresponding inliers and outliers define a new child node. A stop criterion is adopted at this point to check when to stop the node expansion, and deem that a leaf node. The set of final matches,, and the set of transformations,, are extracted from the leaf nodes that satisfy a threshold over the minimal number of inliers, i.e.,. However, since some correct matches could have been erroneously labeled as hard outlier in a previous phase, a final top-down phase tries to recover them by checking their pixel reprojection error with any of the final affine transformations in. It is evident at this point that, even if our HMA shares the same tree-like structure of hierarchical k-means [43], it also has significant differences. First, HMA makes use of two additional stages (affine estimation, and correction) in order to generate a clustering of the matches that would result into separated image regions. Second, HMA can detect outliers by integrating a geometric-constraint phase. Finally, HMA expands a node if necessary and not only a fixed amount of times. 1) Initial Hough Voting: Since the set of initial (appearancebased) matches may contain a large number of incorrect matches [e.g., those in Fig. 4(a)], HMA adopts a Hough voting to remove potential outliers. In doing so, the probability of success of the next phases increases, which also leads to a lower expected computational time [16]. This voting is done in the keypoint parameters space, which is discretized by using broad bin sizes of 0.25 times the image dimension of the larger image for both the translation of the similarity transformation and the image-position, 30 for the orientation, and a factor of 2 for the scale. The Hough voting first computes the similarity parameters of the matches (cf. Section II-B1). Then, each match votes for the bin closest to its similarity parameters. Note that bins with many votes represent matches with consistent parameters, while bins with fewer votes represent potential outliers. For this reason those matches voting for bins with less than three votes are removed, since three is the minimal number of matches required to fit anaffine transformation (cf. Section II-B2). These removed matches are sent to the hard-outliers buffer,,whiletheremaining ones are passed to the node-expansion phase. 2) Node Expansion: Clustering Step: This represents the first step in the node expansion phase, which is executed at every level of the tree. In this step, the matches in the input node (e.g., Fig. 8. Clustering step: Example of the resulting sets of the clustering step for an early stage of the tree. Note that clusters 1 and 4 successfully isolates the majority of correct matches. root node, if the current tree level is 0) are partitioned into clusters (see Fig. 8) by applying k-means [43] over a six-dimensional vector consisting of both the query-keypoint position,, as well as the four similarity-transformation parameters (cf. Section II-B). We observed that clustering in this six-dimensional space is the key, because it simultaneously leverages the spatial position of each keypoint, together with the geometric information given by the similarity parameters. As a result, the obtained clusters will contain features that are both close spatially (with respect to ), and have closer similarity parameters (which represent indeed a first approximation to each local affine transformation). As shown in the example of Fig. 8, we observed that the translation parameters play an important role in discriminating between correct and incorrect matches in early levels of the tree, such as the root node. Giving a higher importance to these translation parameters in the vectors passed to k-means is key to successfully isolate most of the outliers, as we can see in Fig. 8(b) and (c). Meanwhile, we also noticed that the keypoint-position components aremoreimportantindiscriminating the correct matches at deeper levels of the tree. In fact, in these cases, the position parameters are ideal to ensure the spatial contiguity of the clusters, while the translation parameters are less informative due to their larger variation within smaller clusters. Finally, we observed that and are sensitive to viewpoint changes at every level of the tree, especially when the observed object is nonplanar. Due to the above facts, HMA weights the components of each vector of matches before applying k-means, depending on the average reprojection error at that node. The weights are computed by interpolating between a given initial weights,,and final ones,, as follows: where is the average of the symmetric reprojection errors 4, and is an increasing function with values in the interval, defined as 4 Consider for the initial iteration, i.e.,. if otherwise

7 PUERTO-SOUZA AND MARIOTTINI: A FAST AND ACCURATE FEATURE-MATCHING ALGORITHM FOR MINIMALLY-INVASIVE ENDOSCOPIC IMAGES 1207 Fig. 9. Example of the alpha function with and. and is the maximum mapping error threshold, and is a smoothing parameter. In Fig. 9, we can see an example of the plot of the -function for and. After extensive empirical evaluation, we determined the following values for the weight vectors: and, which provide a good trade-off between outlier detection capabilities and spatial contiguity of clusters. Note that, when the number of outliers is large (such as in the initial levels of the tree), will give more importance to the translation parameters, thus separating matches with large translation (possible outliers). When the number of outliers is reduced, will reduce its importance on the translation parameters, and it will increase its values on the position (thus enforcing spatial contiguity of the clusters). From our experience, a good value for the number of clusters is, since it offers a good balance between simplifying the problem (i.e., generating four easier subproblems), but without compromising the overall accuracy. Instead, a larger will tend to generate clusters with few matches making sometimes impossible to fit anyaffine transformation. 3) Node Expansion: Affine-estimation Step: This is the core component of HMA, in which the geometric constraints are imposed at each node, in order to simultaneously estimate the local affine transformation, and to remove possible outliers. First, the affine-estimation step verifies that the input cluster contains enough matches to fit an affine model (i.e., more than three matches), otherwise the cluster is discarded and the matches are sent to the buffer.anaffine-transformation is then estimated by means of RANSAC (cf. Sections II-B2 and II-C), and if the consensus is greater than a minimum value, the inliers and outliers are computed, and a final affine model is estimated from the obtained inliers. If the minimal consensus is not reached, the cluster is discarded and the matches are sent to the buffer. As anticipated in the previous section, and after the entire tree is created, a final opportunity will be given in the top-down phase to those outliers in to be associated with one of the final transformations. Fig. 10(a) shows an example of the AT and the set of inliers (colored dots) obtained from the clusters and in the exampleoffig.8.asobserved,these regions nicely capture keypoints according to the slope of the organ s surface. Also, note that the clusters and were discarded because it was not possible to fit anaffine transformation with a minimum consensus. Finally, in order to remove isolated matches with Fig. 10. (a) Example of affine transformations (the planar regions) and inliers for each cluster (better seen in color), observe that the local region associated to each cluster (colored square) may overlap in both images. (b) and (c) Example of the correction phase. (b) Two clusters of inliers (blue and red points), and their associated an image regions (red and blue squares). Note that these regions overlap (yellow circle) which negatively affects the performance of HMA. (c) LDA computes a separation (solid yellow line) between both clusters, generating two nonoverlapping image regions (colored regions). large residuals 5, HMA adopts a nonparametric outlier-detection technique [44]. In particular, from the statistics of the symmetric reprojection errors in each node,,athreshold is built to discard matches with a larger error. This threshold is computed as,where is the th quartile of over all the matches, and is a factor 6.Thosematcheswitha reprojection error outside this statistical measure are removed from the cluster and sent to. For each cluster, the resulting affine transformation, and the set of inliers and outliers are passed to the correction step. Note the following key observations regarding the affine-estimation step. As well known in RANSAC, choosing the right consensus value is crucial. In our experience, represents a good trade-off between robustness and cluster s rejection. Clusters with low percentage of correct matches are rejected, since the RANSAC estimation assumes that a majority of the matches are inliers. The values of and are closely related. For example, a large value of (which reduces HMA s computational complexity) will generate clusters with fewer matches, thus requiring a smaller (which is less robust to outliers) to avoid immediate cluster rejections. 4) Node Expansion: Correction Step: Even after the clustering and the affine-estimation steps, some feature matches belonging to distinct clusters (i.e., distinct affine transformations) may still overlap, i.e., they may share a common image area. For example, Fig. 10(b) shows an overlap (blue and red points) of two clusters after the affine-estimation step. This overlap can have negative effects on the performance of our algorithm. For example, when computing the affine mapping of a generic imagefeaturelyingintheoverlapping region, it is not clear which affine transformation should be used. The goal of the Correction Phase is to solve this ambiguity by creating nonoverlapping clusters of matches. We addressed 5 These matches could negatively affect subsequent clustering phases. 6 has been fixed to.

8 1208 IEEE TRANSACTIONS ON MEDICAL IMAGING, VOL. 32, NO. 7, JULY 2013 Fig. 11. In-lab experiment. (a) Some of the images used in the rotation set. The left image was fixed as training image, while the object in the query images was rotated from to 30 over its vertical axis. (b) Some of the images in the deformation set, where a soft object (left image) was deformed with different levels of strength. this problem by passing the query-keypoint positions and their corresponding cluster s class indexes as training data of a linear discriminant analysis (LDA) algorithm [43]. LDA uses this training data to learn the linear parameters used to separate feature matches within each cluster. Finally, the inliers and outliers are used as testing set for LDA. In this way, each image region is now reassigned to only one cluster, as observed in the example of Fig. 10(c). Once all the matches are reassigned, the sets of inliers and outliers are updated with a final estimation of affine transformation. 5) Stop Criterion: In order to stop the expansion of the th node, and to deem it as a leaf node, HMA examines when there is no benefit in further expanding that node. This is achieved by measuring the ratio of inliers,.in particular, a node is deemed as a leaf when. 6) Top-Down Phase: After each node reaches a leaf state, a final opportunity is given to retrieve correct matches that were erroneously classified as hard outliers; the matches in are recovered and associated to the (spatially) closer node, by examining the symmetric reprojection error. III. EXPERIMENTS AND RESULTS A. Overview of the Experimental Evaluation We thoroughly compared the performance of HMA in two scenarios: 1) a highly controlled in-lab dataset with nonplanar objects, and 2) a large laparoscopic-surgery dataset with more than 100 images acquired from six real videos of partial nephrectomy interventions. The in-lab dataset consists of over 18 image pairs simulating highly-controlled cases of occlusion, viewpoint changes, and deformations. Fig. 11 shows a representative example for each of these scenarios. Note that, while the popular graffiti image was chosen as texture for the in-lab objects, our in-lab dataset substantially differs from other computer vision databases because in our case this texture is applied to nonplanar object surfaces. The laparoscopic-image dataset contains many cases in which there is a real need to accurately retrieve a precise and large number of matches. These cases range from prolonged and complete camera occlusion (e.g., due to the surgical tool motion in front of the camera), camera retraction and reinsertion, sudden camera motion, and specular reflections. Fig. 14 shows image pairs that are representative for each of these scenarios. In order to provide an extensive evaluation of HMA over all of the above scenarios and over several features (SIFT, ASIFT, and SURF), as well as a thorough comparison against other state-of-the-art feature-matching algorithms (cf. Section III-D), we manually-annotated the data (both the features and the matches) extracted from the aforementioned endoscopic images. This ground-truth data was captured by four expert users, where three users provided independent annotations, and the fourth user solved for any conflict among the three annotations. Initial SIFT matches were manually labeled by each user as correct or incorrect by carefully observing their position on the two images. Note that only the most certain matches were labeled as correct. A set of manually-selected corresponding corners were selected by each user in each image pair. Also in this case, only the most certain image correspondences were selected. Note that, SURF and ASIFT were not labeled due to large number of their initial matches. 7 Also, note that for SURF and ASIFT features, we did not limit our datasets only to the strongest features in order not to alter the performance (e.g., the percentage of correct matches). Our comparison (for each feature type) is based on measuring the algorithms accuracy in detecting the correct matches (matching performance), as well as their accuracy in mapping ground-truth corresponding points between images (mapping performance). These measures are described in detail in Section III-C. Additionally, we compared the computational time of each algorithm. The computational time is measured in seconds of timer CPU 8 required by each run of each algorithm. In our experimental evaluation (cf. Sections III-E III-F) we compare, analyze and discuss the performance of all the aforementioned algorithms, and show that HMA represents a considerable improvement over the existing methods. B. Comparison With Existing Datasets Our benchmark improves over existing databases because it includes many carefully-annotated surgical images for the accurate testing in a minimally-invasive surgical scenario. In our dataset webpage we provide a detailed description of both the DB image-pairs, as well as of the (ground-truth) matching and mapping data, to be used by the community to evaluate future algorithms. To the best of our knowledge, only two other large data sets have been made publicly available. The dataset in [45] contains many stereo videos from real endoscopic surgeries. However, and differently from our benchmark, [45] does not currently contain any MIS image pairs from challenging cases that can be used for evaluating feature-matching algorithms. Furthermore, ground-truth corresponding features and matching labels are not currently provided. Finally, [45] does not include any in-lab experiments (with ground truth features) under controlled object rotations and deformations. The dataset in [46] contains thousands of images to be used in many computer-vision problems, such as image retrieval, 7 For example, ASIFT can produce more than 4000 initial matches per image, thus rendering the ground-truth (manual) labelling almost unfeasible. 8 Intel Core i7 2670QM 2.20 GHz, Intel Corp., Santa Clara, CA, USA.

9 PUERTO-SOUZA AND MARIOTTINI: A FAST AND ACCURATE FEATURE-MATCHING ALGORITHM FOR MINIMALLY-INVASIVE ENDOSCOPIC IMAGES 1209 classification, recognition and, finally, matching. However, this dataset does not contain any endoscopic-image cases, and the provided images are only obtained by applying several (known) homography distortions to single images of (almost-planar) scenes (e.g., building facade). As a result, this dataset cannot be used to compare the efficiency and robustness of feature-matching algorithms towards scene distortions, as well as towards possible reflections, clutter, and illumination changes (which are very common in real surgical scenarios when moving the endoscopic camera to a different viewpoint). C. Validation Metrics The matching-performance metric compares the final matches towards a ground-truth of manually-labelled matches. We based our analysis on receiver operating characteristic (ROC) curves [39], which are commonly used to visualize, organize and select classifiers based on their classification performance. ROC curves depict the relative trade-off between the sensitivity or recall, andthe1-specificity. For instance, given a set of initial and final matches with known labels (correct or wrong), we compute the sets of true positives and false positives that represent, respectively, those matches included in the set of final matches that were labeled as correct, as well as those matches included in the set of final matches, but labeled as wrong. These sets are used to compute the sensitivity and 1-specificity as follows: The ROC curves are parameterized based on a score,whichindicates the quality of the algorithm to accept/reject each match, e.g., the negative of the reprojection error (of each match when mapped by their corresponding AT). The matches are sorted in descending order according to these score values; this order is used to iteratively compute and plot the cumulative sensitivity, and 1-specificity (cf. [39] for illustrative examples). The mapping-performance metric measures how precisely each algorithm s transformation maps a manually-selected set of corresponding points between each image pairs (cf. Section III-A). This is done by computing the symmetric reprojection error [36] of these points. Note that this measure is independent from the number of initial matches (which varies with different thresholds, and different features); in fact, the mapping performance only requires a set of known corresponding points between images. D. Description of the Comparing Methods In what follows, we briefly describe three recent feature-matching algorithms that we compared against HMA. These algorithms differ for their geometric constraints used to detect and remove outliers from the initial matches (cf. Section II-A). We also provide a short analysis of the strengths and weaknesses of each algorithm. Lowe s algorithm [16]: This method works similarly to the strategy described in Section II-C, by estimating a single affine transformation, which maps features from the training to the query image. The refined matches are only those that obey to such a geometric affine transformation within a specific reprojection-error pixel threshold. This transformation is estimated by first using a voting scheme (Hough) to cluster the initial matches into sets with closer similarity parameters. For each cluster, a single affine transformation is estimated (cf. Section II-C), and a probabilistic model is used to select the best model. Adaptive multi-affine (AMA) algorithm [22]: AMA relaxes the assumption of a single affine model by estimating asetofmultiple affine transformations, where each one is associated to a cluster of matches. As a result, the inliers (matches) are now distributed along the entire organ s surface. In AMA a set of clusters is estimated as in [16], and are then sent to a cascade of RANSAC-based affine estimators. For each cluster, a supporting transformation is computed. An adaptive procedure is then used to select the best for k-means, and to estimate all the transformations and the corresponding inliers. AMA extracts more inliers than Lowe s approach, however the adaptive process is computationally expensive [38]. Agglomerative correspondence clustering (ACC) algorithm [23]: ACC determines the set of refined matches by employing a hierarchical clustering algorithm based on an agglomerative (bottom-up) strategy [43]. This strategy iteratively merges pairs of matches (or clusters of matches) into a single cluster based on a dissimilarity measure between matches (or clusters). This dissimilarity measure consists of both geometric and appearance constraints. Finally, ACC iteratively merges clusters according to both their dissimilarity measure and a linkage criteria, to generate the final clusters. ACC requires the user to specify a larger number of (nonintuitive) parameters than in Lowe, AMA, and HMA. For a fair comparison, all of the aforementioned algorithms were implemented in MATLAB, and they were tested using the same dataset. Our implementation of HMA used a RANSAC threshold of 5 pixels, a minimal consensus threshold of 6, a threshold over the ratio of inliers of 0.9,, weight vectors and. Similarly, we used the same threshold of 5 pixels for AMA and Lowe. ACC uses 10 as cutoff value of the linkage function (threshold over the dissimilarity measure [23]) which we observed to give a maximized performance. The remaining parameters for Lowe, AMA, and ACC were fixed as originally reported in [16], [22], and [23], respectively. E. In-Lab Experiments The goal of the in-lab experiments is to evaluate the performance of the algorithms under highly-controlled cases of rotation and deformation. This dataset consists of 18 image pairs divided into two testing cases: the rotation case consisting of seven image pairs of a nonplanar object in which the query images where taken under controlled object rotations (ranging from to ), as shown in Fig. 11(a). Other 11 image

10 1210 IEEE TRANSACTIONS ON MEDICAL IMAGING, VOL. 32, NO. 7, JULY 2013 TABLE I IN-LAB EXPERIMENT: ROTATION AND DEFORMATION SETS AVERAGE REPROJECTION ERRORS (PIXELS) PER IMAGE PAIR TABLE II IN-LAB DATASET:ALGORITHMS AVERAGE PERFORMANCE Fig. 12. In-lab experiment Matching performance (better seen in color). (a) Plots of the AUC of the ROC curves for the rotation case. (b) AUC curves for the deformation case. pairs correspond to the deformation case, in which a soft object was subject to different levels of deformations (low, medium, strong), as Fig. 11(b) illustrates. Each image has a resolution of pixels, and the average number of initial matches for SIFT, SURF, and ASIFT are: 196, 239, and 4000, for the rotation set, and 540, 545, and 8761 for the deformation one. Our mapping ground-truth consists of an average of 55 corresponding corners for the rotation, and 30 for the deformation, which were carefully selected from different parts on the entire surface of the objects. We computed the area-under-the-curve (AUC) of each ROC curve in order to better illustrate a comparison between the performance of each method in these in-lab experiments. Fig. 12 shows the comparison for the four algorithms in both testing cases. Note that larger AUC values (i.e., closer to 1) indicate better matching performance. Table I provides detailed statistical results of each algorithms mapping performance for each testing case. The columns 3 9 represent a different image-pair s rotation, while the columns represent different levels of deformation (low, medium, and strong). Fig. 13. Qualitative example of the matching performance of the in-lab experiment for SIFT features. Note that the single affine transformation (yellow arrow) estimated by Lowe tends to accumulate large reprojection errors (yellow lines) when mapping those ground truth correspondences (green stars) that are not supporting the affine transformation (polygon). (Better seen in color). Table II summarizes the average results for both cases. In particular, the first four rows contain the results for the rotation case and the last four the results for the deformation case. The second, fifth and eighth columns show the mean and standard deviation of the symmetric pixel reprojection error. The third, sixth, and ninth columns present the required computational time (in seconds), while the fourth column indicate the sensitivity and 1-specificity values, and the seventh and tenth contain the percentage of inliers which we use as a heuristic to indicate the detection power of each algorithm. Fig. 13 shows an example of qualitative results when SURF features are used. Note that HMA, AMA, and Lowe retrieve both a set refined matches, as well as multiple (or a single) affine transformations. Note that Lowe s single affine transformation only captures a reduced number of matches, localized on a strip of the object s surface, thus indicating the difficulty to reliably

11 PUERTO-SOUZA AND MARIOTTINI: A FAST AND ACCURATE FEATURE-MATCHING ALGORITHM FOR MINIMALLY-INVASIVE ENDOSCOPIC IMAGES 1211 TABLE III LAPAROSCOPIC DATASET:ALGORITHMS PERFORMANCE Fig. 14. Examples of images pairs contained in the database. map features far away from the ones supporting the (single) affine transformation. As a result, and as illustrated in Fig. 13(b), the mapping of those ground-truth correspondences lying on the sides of the object surfaces give rise to high reprojection errors. Conversely, HMA and AMA capture more matches distributed around the whole object s surface. Also observe that ACC sometimes tends to detect more ambiguous matches than the other methods, e.g., those at the top-right corner of the object. F. Surgical-Images Dataset Our data set includes more than 100 image pairs with resolution of These images were selected from cases of instrument occlusion, fast camera or organ motion, change of illumination, or camera retraction. In particular, the image pairs were manually selected by considering cases without blur and those with large viewpoint changes but still with high visibility of the same scene. Fig. 14 shows some representative examples of image pairs relative to cases of camera retraction, complete occlusion, or zoom. The sets of initial matches were in average approximately 265, for SIFT, 233 for SURF, and 2872 for ASIFT.The set of manually selected correspondences between each image pair was in average of 20 points. Manually obtaining a higher (average) number of ground-truth corresponding corners was indeed very difficult because of the strong illumination changes, and image clutter. Fig. 15(a) depicts the ROC curve for each algorithm when SIFT features are used. Note that the results of all the image pairs are integrated into the average ROC curves for each algorithm. We also include the confidence intervals 9 (vertical lines) for some selected scores values. 10 In addition Fig. 15(b) and (c) shows the direct comparisons of sensitivity and 1-specificity between HMA and the other methods. Table III summarizes the average results for the algorithms mapping performance of the different types of features for the same parameters and thresholds than in the in-lab experiment. Fig. 16 show qualitative examples of the matching performance of the four algorithms for SIFT, SURF, and ASIFT. 9 We use a significance level of 95% in a two tail test. 10 For clarity, we only present the score values: that represent small, medium and large reprojection errors. Fig. 15. MIS dataset matching performance (better seen in color). (a) Average ROC curves of the four algorithms. The vertical lines show the 95% confidence intervals of the mean. (b) and (c) Direct comparison of the HMA s sensitivity and 1-specificity against Lowe, AMA, and ACC. IV. DISCUSSION AND CONCLUSION In this work, we have presented our novel Hierarchical Multi- Affine (HMA) feature-matching algorithm to find image similarities between two laparoscopic views. HMA removes incorrect matches from a given set of initial appearance-based matches by iteratively partitioning the features into clusters over the organ s surface, and by estimating an affine transformation for each cluster. This affine mapping allows to recover the pixel position of those tracked features that were lost after a complete and prolonged occlusion, or sudden camera motions. HMA is an important tool in MIS applications, and has the potential to become a core component in many surgical-vision applications. We evaluated our HMA algorithm in two controlled in-lab tests, as well as in challenging MIS scenarios. In addition, we compared HMA s performances with respect to other state-ofthe-art algorithms: Lowe s [16], adaptive multi-affine (AMA) [22], and Agglomerative correspondence clustering (ACC) algorithms [23] (cf. Section III-D). The in-lab experiment (cf. Section III-E) demonstrated higher performance in matching, mapping and time of HMA with respect to the other algorithms under cases of highly-controlled rotation and deformation. In particular, from Fig. 12 is clear that HMA outperforms AMA, Lowe and ACC in the matching task. Table I shows that HMA has also a higher mapping accuracy than Lowe and ACC, in terms of both average error and standard deviation. Note that

12 1212 IEEE TRANSACTIONS ON MEDICAL IMAGING, VOL. 32, NO. 7, JULY 2013 Fig. 16. Qualitative example of the matching performance of the four algorithms for three different image pairs. The first column represents solutions for the database image pair 13 considering SIFT features; the second column contains solutions for the image pair 6 when using SURF features. The third column shows the solutions for image pair 38 when ASIFT features are used. (Better seen in color). AMA is very competitive, indicating that the multi-affine transformation approach adapts better to (nonplanar) object rotations and deformations. However, even if HMA and AMA share the same multi-affine approach, HMA was specifically designed to be faster that AMA. HMA s uses a hierarchical structure that iteratively divides the feature-matching problem into smaller subproblems, thus reducing the computational throughput. Instead, AMA uses a brute-force strategy to determine the right number of local affine transformations, thus resulting in a very large overhead. HMA improvement over the other approaches is shown in Table II. In particular, note the large computational time difference when the number of matches is large. For instance, in the deformation case for ASIFT, HMA is three times faster than Lowe, 39 times faster than AMA, and 914 times faster than ACC. In the second experiment, we evaluated these feature-matching algorithms under the highly-cluttered MIS environment. Our database is particularly challenging due to the high percentage of incorrect matches after the appearance-based matching phase (this is larger than in the in-lab test). This happens for several reasons, such as: few and sparse features due to large texture-less image regions; image similarities due to the ambiguous nature of surgical images; high image distortion due to the endoscope lenses, and endoscopic illumination (for example, this last phenomenon causes many good matches to be localized around the image center). Despite these problems, our results illustrated in the average ROC curves in Fig. 15, in the qualitative comparison in Fig. 16, and in Table III show that HMA achieves a great balance between speed and accuracy. In fact, HMA has a high matching and mapping performance, but with a significantly reduced computational time. Furthermore, Fig. 15(a) shows that HMA only requires a score (reprojection error) of 15 (pixels) to detect most of the correct matches, while the other algorithms require larger thresholds, and can only detect an increased number of false positives (incorrect matches). Fig. 15(b) shows that HMA achieves better sensitivity values than the other methods with the same score values, thus indicating an increased capability to detect the correct matches even at lower score thresholds. Simultaneously, in Fig. 15(c) HMA has similar performance than Lowe and AMA, as well as more stability to outliers than ACC. In particular, observe the gap between the dashed (pink) curve and the other curves in Fig. 15(c). This gap indicates a higher detection of false positives (incorrect matches) when ACC s dissimilarity threshold is increased. Also, as we observed in Fig. 16, HMA is comparable with AMA, is more efficient than Lowe s single affine formulation, and is more robust than ACC [several ambiguities can be readily found by visual inspection, e.g., in Fig. 16(c)]. Furthermore, from the results in Table III, and similarly to the in-lab experiment, we observed that HMA achieves a mapping error comparable to AMA, but at speed rate comparable with Lowe s. Note that these results demonstrate that HMA hierarchical formulation successfully increased the convergence speed without sacrificing mapping accuracy. Also, observe that in the case of a large number of matches (e.g., when using ASIFT) HMA again achieves reduced computational times when compared with Lowe, AMA and ACC (approximately 2.57, 20, and 245 times faster, respectively). In addition, we collected HMA s tree maximum depth for both experiments. We observed that HMA reaches a tree depth of,,and for SIFT, SURF, and ASIFT, respectively. Across all the MIS image-pairs, the minimum depth reported was 2 (SURF), and the maximum 62 (ASIFT). In the case of the in-lab dataset, we observed that the percentage of correct matches in each image pair is usually

Real-time Feature Matching for the Accurate Recovery of Augmented-Reality Display in Laparoscopic Videos

Real-time Feature Matching for the Accurate Recovery of Augmented-Reality Display in Laparoscopic Videos Real-time Feature Matching for the Accurate Recovery of Augmented-Reality Display in Laparoscopic Videos Gustavo A. Puerto, Alberto Castaño-Bardawil, and Gian-Luca Mariottini Department of Computer Science

More information

Segmentation and Tracking of Partial Planar Templates

Segmentation and Tracking of Partial Planar Templates Segmentation and Tracking of Partial Planar Templates Abdelsalam Masoud William Hoff Colorado School of Mines Colorado School of Mines Golden, CO 800 Golden, CO 800 amasoud@mines.edu whoff@mines.edu Abstract

More information

arxiv: v1 [cs.cv] 28 Sep 2018

arxiv: v1 [cs.cv] 28 Sep 2018 Camera Pose Estimation from Sequence of Calibrated Images arxiv:1809.11066v1 [cs.cv] 28 Sep 2018 Jacek Komorowski 1 and Przemyslaw Rokita 2 1 Maria Curie-Sklodowska University, Institute of Computer Science,

More information

CS 223B Computer Vision Problem Set 3

CS 223B Computer Vision Problem Set 3 CS 223B Computer Vision Problem Set 3 Due: Feb. 22 nd, 2011 1 Probabilistic Recursion for Tracking In this problem you will derive a method for tracking a point of interest through a sequence of images.

More information

CS 231A Computer Vision (Fall 2012) Problem Set 3

CS 231A Computer Vision (Fall 2012) Problem Set 3 CS 231A Computer Vision (Fall 2012) Problem Set 3 Due: Nov. 13 th, 2012 (2:15pm) 1 Probabilistic Recursion for Tracking (20 points) In this problem you will derive a method for tracking a point of interest

More information

CS 231A Computer Vision (Winter 2014) Problem Set 3

CS 231A Computer Vision (Winter 2014) Problem Set 3 CS 231A Computer Vision (Winter 2014) Problem Set 3 Due: Feb. 18 th, 2015 (11:59pm) 1 Single Object Recognition Via SIFT (45 points) In his 2004 SIFT paper, David Lowe demonstrates impressive object recognition

More information

/10/$ IEEE 4048

/10/$ IEEE 4048 21 IEEE International onference on Robotics and Automation Anchorage onvention District May 3-8, 21, Anchorage, Alaska, USA 978-1-4244-54-4/1/$26. 21 IEEE 448 Fig. 2: Example keyframes of the teabox object.

More information

Stereo and Epipolar geometry

Stereo and Epipolar geometry Previously Image Primitives (feature points, lines, contours) Today: Stereo and Epipolar geometry How to match primitives between two (multiple) views) Goals: 3D reconstruction, recognition Jana Kosecka

More information

Motion Tracking and Event Understanding in Video Sequences

Motion Tracking and Event Understanding in Video Sequences Motion Tracking and Event Understanding in Video Sequences Isaac Cohen Elaine Kang, Jinman Kang Institute for Robotics and Intelligent Systems University of Southern California Los Angeles, CA Objectives!

More information

K-Means Based Matching Algorithm for Multi-Resolution Feature Descriptors

K-Means Based Matching Algorithm for Multi-Resolution Feature Descriptors K-Means Based Matching Algorithm for Multi-Resolution Feature Descriptors Shao-Tzu Huang, Chen-Chien Hsu, Wei-Yen Wang International Science Index, Electrical and Computer Engineering waset.org/publication/0007607

More information

Perception IV: Place Recognition, Line Extraction

Perception IV: Place Recognition, Line Extraction Perception IV: Place Recognition, Line Extraction Davide Scaramuzza University of Zurich Margarita Chli, Paul Furgale, Marco Hutter, Roland Siegwart 1 Outline of Today s lecture Place recognition using

More information

Instance-level recognition

Instance-level recognition Instance-level recognition 1) Local invariant features 2) Matching and recognition with local features 3) Efficient visual search 4) Very large scale indexing Matching of descriptors Matching and 3D reconstruction

More information

CSE 252B: Computer Vision II

CSE 252B: Computer Vision II CSE 252B: Computer Vision II Lecturer: Serge Belongie Scribes: Jeremy Pollock and Neil Alldrin LECTURE 14 Robust Feature Matching 14.1. Introduction Last lecture we learned how to find interest points

More information

EE795: Computer Vision and Intelligent Systems

EE795: Computer Vision and Intelligent Systems EE795: Computer Vision and Intelligent Systems Spring 2012 TTh 17:30-18:45 FDH 204 Lecture 09 130219 http://www.ee.unlv.edu/~b1morris/ecg795/ 2 Outline Review Feature Descriptors Feature Matching Feature

More information

EE368 Project Report CD Cover Recognition Using Modified SIFT Algorithm

EE368 Project Report CD Cover Recognition Using Modified SIFT Algorithm EE368 Project Report CD Cover Recognition Using Modified SIFT Algorithm Group 1: Mina A. Makar Stanford University mamakar@stanford.edu Abstract In this report, we investigate the application of the Scale-Invariant

More information

Instance-level recognition

Instance-level recognition Instance-level recognition 1) Local invariant features 2) Matching and recognition with local features 3) Efficient visual search 4) Very large scale indexing Matching of descriptors Matching and 3D reconstruction

More information

A Rapid Automatic Image Registration Method Based on Improved SIFT

A Rapid Automatic Image Registration Method Based on Improved SIFT Available online at www.sciencedirect.com Procedia Environmental Sciences 11 (2011) 85 91 A Rapid Automatic Image Registration Method Based on Improved SIFT Zhu Hongbo, Xu Xuejun, Wang Jing, Chen Xuesong,

More information

Local Features: Detection, Description & Matching

Local Features: Detection, Description & Matching Local Features: Detection, Description & Matching Lecture 08 Computer Vision Material Citations Dr George Stockman Professor Emeritus, Michigan State University Dr David Lowe Professor, University of British

More information

A Comparison of SIFT, PCA-SIFT and SURF

A Comparison of SIFT, PCA-SIFT and SURF A Comparison of SIFT, PCA-SIFT and SURF Luo Juan Computer Graphics Lab, Chonbuk National University, Jeonju 561-756, South Korea qiuhehappy@hotmail.com Oubong Gwun Computer Graphics Lab, Chonbuk National

More information

CSE 527: Introduction to Computer Vision

CSE 527: Introduction to Computer Vision CSE 527: Introduction to Computer Vision Week 5 - Class 1: Matching, Stitching, Registration September 26th, 2017 ??? Recap Today Feature Matching Image Alignment Panoramas HW2! Feature Matches Feature

More information

Instance-level recognition part 2

Instance-level recognition part 2 Visual Recognition and Machine Learning Summer School Paris 2011 Instance-level recognition part 2 Josef Sivic http://www.di.ens.fr/~josef INRIA, WILLOW, ENS/INRIA/CNRS UMR 8548 Laboratoire d Informatique,

More information

Thin Plate Spline Feature Point Matching for Organ Surfaces in Minimally Invasive Surgery Imaging

Thin Plate Spline Feature Point Matching for Organ Surfaces in Minimally Invasive Surgery Imaging Thin Plate Spline Feature Point Matching for Organ Surfaces in Minimally Invasive Surgery Imaging Bingxiong Lin, Yu Sun and Xiaoning Qian University of South Florida, Tampa, FL., U.S.A. ABSTRACT Robust

More information

Object Recognition with Invariant Features

Object Recognition with Invariant Features Object Recognition with Invariant Features Definition: Identify objects or scenes and determine their pose and model parameters Applications Industrial automation and inspection Mobile robots, toys, user

More information

CS 231A Computer Vision (Winter 2018) Problem Set 3

CS 231A Computer Vision (Winter 2018) Problem Set 3 CS 231A Computer Vision (Winter 2018) Problem Set 3 Due: Feb 28, 2018 (11:59pm) 1 Space Carving (25 points) Dense 3D reconstruction is a difficult problem, as tackling it from the Structure from Motion

More information

Introduction. Introduction. Related Research. SIFT method. SIFT method. Distinctive Image Features from Scale-Invariant. Scale.

Introduction. Introduction. Related Research. SIFT method. SIFT method. Distinctive Image Features from Scale-Invariant. Scale. Distinctive Image Features from Scale-Invariant Keypoints David G. Lowe presented by, Sudheendra Invariance Intensity Scale Rotation Affine View point Introduction Introduction SIFT (Scale Invariant Feature

More information

Classification. Vladimir Curic. Centre for Image Analysis Swedish University of Agricultural Sciences Uppsala University

Classification. Vladimir Curic. Centre for Image Analysis Swedish University of Agricultural Sciences Uppsala University Classification Vladimir Curic Centre for Image Analysis Swedish University of Agricultural Sciences Uppsala University Outline An overview on classification Basics of classification How to choose appropriate

More information

A Systems View of Large- Scale 3D Reconstruction

A Systems View of Large- Scale 3D Reconstruction Lecture 23: A Systems View of Large- Scale 3D Reconstruction Visual Computing Systems Goals and motivation Construct a detailed 3D model of the world from unstructured photographs (e.g., Flickr, Facebook)

More information

Structured Light II. Thanks to Ronen Gvili, Szymon Rusinkiewicz and Maks Ovsjanikov

Structured Light II. Thanks to Ronen Gvili, Szymon Rusinkiewicz and Maks Ovsjanikov Structured Light II Johannes Köhler Johannes.koehler@dfki.de Thanks to Ronen Gvili, Szymon Rusinkiewicz and Maks Ovsjanikov Introduction Previous lecture: Structured Light I Active Scanning Camera/emitter

More information

CS231A Section 6: Problem Set 3

CS231A Section 6: Problem Set 3 CS231A Section 6: Problem Set 3 Kevin Wong Review 6 -! 1 11/09/2012 Announcements PS3 Due 2:15pm Tuesday, Nov 13 Extra Office Hours: Friday 6 8pm Huang Common Area, Basement Level. Review 6 -! 2 Topics

More information

Chapter 3 Image Registration. Chapter 3 Image Registration

Chapter 3 Image Registration. Chapter 3 Image Registration Chapter 3 Image Registration Distributed Algorithms for Introduction (1) Definition: Image Registration Input: 2 images of the same scene but taken from different perspectives Goal: Identify transformation

More information

COSC160: Detection and Classification. Jeremy Bolton, PhD Assistant Teaching Professor

COSC160: Detection and Classification. Jeremy Bolton, PhD Assistant Teaching Professor COSC160: Detection and Classification Jeremy Bolton, PhD Assistant Teaching Professor Outline I. Problem I. Strategies II. Features for training III. Using spatial information? IV. Reducing dimensionality

More information

EE795: Computer Vision and Intelligent Systems

EE795: Computer Vision and Intelligent Systems EE795: Computer Vision and Intelligent Systems Spring 2012 TTh 17:30-18:45 FDH 204 Lecture 10 130221 http://www.ee.unlv.edu/~b1morris/ecg795/ 2 Outline Review Canny Edge Detector Hough Transform Feature-Based

More information

Clustering CS 550: Machine Learning

Clustering CS 550: Machine Learning Clustering CS 550: Machine Learning This slide set mainly uses the slides given in the following links: http://www-users.cs.umn.edu/~kumar/dmbook/ch8.pdf http://www-users.cs.umn.edu/~kumar/dmbook/dmslides/chap8_basic_cluster_analysis.pdf

More information

SUMMARY: DISTINCTIVE IMAGE FEATURES FROM SCALE- INVARIANT KEYPOINTS

SUMMARY: DISTINCTIVE IMAGE FEATURES FROM SCALE- INVARIANT KEYPOINTS SUMMARY: DISTINCTIVE IMAGE FEATURES FROM SCALE- INVARIANT KEYPOINTS Cognitive Robotics Original: David G. Lowe, 004 Summary: Coen van Leeuwen, s1460919 Abstract: This article presents a method to extract

More information

An Ensemble Approach to Image Matching Using Contextual Features Brittany Morago, Giang Bui, and Ye Duan

An Ensemble Approach to Image Matching Using Contextual Features Brittany Morago, Giang Bui, and Ye Duan 4474 IEEE TRANSACTIONS ON IMAGE PROCESSING, VOL. 24, NO. 11, NOVEMBER 2015 An Ensemble Approach to Image Matching Using Contextual Features Brittany Morago, Giang Bui, and Ye Duan Abstract We propose a

More information

Instance-level recognition II.

Instance-level recognition II. Reconnaissance d objets et vision artificielle 2010 Instance-level recognition II. Josef Sivic http://www.di.ens.fr/~josef INRIA, WILLOW, ENS/INRIA/CNRS UMR 8548 Laboratoire d Informatique, Ecole Normale

More information

Image Features: Local Descriptors. Sanja Fidler CSC420: Intro to Image Understanding 1/ 58

Image Features: Local Descriptors. Sanja Fidler CSC420: Intro to Image Understanding 1/ 58 Image Features: Local Descriptors Sanja Fidler CSC420: Intro to Image Understanding 1/ 58 [Source: K. Grauman] Sanja Fidler CSC420: Intro to Image Understanding 2/ 58 Local Features Detection: Identify

More information

Overview. Augmented reality and applications Marker-based augmented reality. Camera model. Binary markers Textured planar markers

Overview. Augmented reality and applications Marker-based augmented reality. Camera model. Binary markers Textured planar markers Augmented reality Overview Augmented reality and applications Marker-based augmented reality Binary markers Textured planar markers Camera model Homography Direct Linear Transformation What is augmented

More information

Mobile Human Detection Systems based on Sliding Windows Approach-A Review

Mobile Human Detection Systems based on Sliding Windows Approach-A Review Mobile Human Detection Systems based on Sliding Windows Approach-A Review Seminar: Mobile Human detection systems Njieutcheu Tassi cedrique Rovile Department of Computer Engineering University of Heidelberg

More information

Specular 3D Object Tracking by View Generative Learning

Specular 3D Object Tracking by View Generative Learning Specular 3D Object Tracking by View Generative Learning Yukiko Shinozuka, Francois de Sorbier and Hideo Saito Keio University 3-14-1 Hiyoshi, Kohoku-ku 223-8522 Yokohama, Japan shinozuka@hvrl.ics.keio.ac.jp

More information

Outline 7/2/201011/6/

Outline 7/2/201011/6/ Outline Pattern recognition in computer vision Background on the development of SIFT SIFT algorithm and some of its variations Computational considerations (SURF) Potential improvement Summary 01 2 Pattern

More information

Eppur si muove ( And yet it moves )

Eppur si muove ( And yet it moves ) Eppur si muove ( And yet it moves ) - Galileo Galilei University of Texas at Arlington Tracking of Image Features CSE 4392-5369 Vision-based Robot Sensing, Localization and Control Dr. Gian Luca Mariottini,

More information

3D Computer Vision. Structured Light II. Prof. Didier Stricker. Kaiserlautern University.

3D Computer Vision. Structured Light II. Prof. Didier Stricker. Kaiserlautern University. 3D Computer Vision Structured Light II Prof. Didier Stricker Kaiserlautern University http://ags.cs.uni-kl.de/ DFKI Deutsches Forschungszentrum für Künstliche Intelligenz http://av.dfki.de 1 Introduction

More information

Range Imaging Through Triangulation. Range Imaging Through Triangulation. Range Imaging Through Triangulation. Range Imaging Through Triangulation

Range Imaging Through Triangulation. Range Imaging Through Triangulation. Range Imaging Through Triangulation. Range Imaging Through Triangulation Obviously, this is a very slow process and not suitable for dynamic scenes. To speed things up, we can use a laser that projects a vertical line of light onto the scene. This laser rotates around its vertical

More information

A Survey of Light Source Detection Methods

A Survey of Light Source Detection Methods A Survey of Light Source Detection Methods Nathan Funk University of Alberta Mini-Project for CMPUT 603 November 30, 2003 Abstract This paper provides an overview of the most prominent techniques for light

More information

arxiv: v1 [cs.cv] 28 Sep 2018

arxiv: v1 [cs.cv] 28 Sep 2018 Extrinsic camera calibration method and its performance evaluation Jacek Komorowski 1 and Przemyslaw Rokita 2 arxiv:1809.11073v1 [cs.cv] 28 Sep 2018 1 Maria Curie Sklodowska University Lublin, Poland jacek.komorowski@gmail.com

More information

3D Reconstruction of a Hopkins Landmark

3D Reconstruction of a Hopkins Landmark 3D Reconstruction of a Hopkins Landmark Ayushi Sinha (461), Hau Sze (461), Diane Duros (361) Abstract - This paper outlines a method for 3D reconstruction from two images. Our procedure is based on known

More information

Learning and Inferring Depth from Monocular Images. Jiyan Pan April 1, 2009

Learning and Inferring Depth from Monocular Images. Jiyan Pan April 1, 2009 Learning and Inferring Depth from Monocular Images Jiyan Pan April 1, 2009 Traditional ways of inferring depth Binocular disparity Structure from motion Defocus Given a single monocular image, how to infer

More information

Reinforcement Matching Using Region Context

Reinforcement Matching Using Region Context Reinforcement Matching Using Region Context Hongli Deng 1 Eric N. Mortensen 1 Linda Shapiro 2 Thomas G. Dietterich 1 1 Electrical Engineering and Computer Science 2 Computer Science and Engineering Oregon

More information

WP1: Video Data Analysis

WP1: Video Data Analysis Leading : UNICT Participant: UEDIN Fish4Knowledge Final Review Meeting - November 29, 2013 - Luxembourg Workpackage 1 Objectives Fish Detection: Background/foreground modeling algorithms able to deal with

More information

Edge and corner detection

Edge and corner detection Edge and corner detection Prof. Stricker Doz. G. Bleser Computer Vision: Object and People Tracking Goals Where is the information in an image? How is an object characterized? How can I find measurements

More information

SIFT: SCALE INVARIANT FEATURE TRANSFORM SURF: SPEEDED UP ROBUST FEATURES BASHAR ALSADIK EOS DEPT. TOPMAP M13 3D GEOINFORMATION FROM IMAGES 2014

SIFT: SCALE INVARIANT FEATURE TRANSFORM SURF: SPEEDED UP ROBUST FEATURES BASHAR ALSADIK EOS DEPT. TOPMAP M13 3D GEOINFORMATION FROM IMAGES 2014 SIFT: SCALE INVARIANT FEATURE TRANSFORM SURF: SPEEDED UP ROBUST FEATURES BASHAR ALSADIK EOS DEPT. TOPMAP M13 3D GEOINFORMATION FROM IMAGES 2014 SIFT SIFT: Scale Invariant Feature Transform; transform image

More information

Rotation Invariant Image Registration using Robust Shape Matching

Rotation Invariant Image Registration using Robust Shape Matching International Journal of Electronic and Electrical Engineering. ISSN 0974-2174 Volume 7, Number 2 (2014), pp. 125-132 International Research Publication House http://www.irphouse.com Rotation Invariant

More information

Trademark Matching and Retrieval in Sport Video Databases

Trademark Matching and Retrieval in Sport Video Databases Trademark Matching and Retrieval in Sport Video Databases Andrew D. Bagdanov, Lamberto Ballan, Marco Bertini and Alberto Del Bimbo {bagdanov, ballan, bertini, delbimbo}@dsi.unifi.it 9th ACM SIGMM International

More information

Midterm Examination CS 534: Computational Photography

Midterm Examination CS 534: Computational Photography Midterm Examination CS 534: Computational Photography November 3, 2016 NAME: Problem Score Max Score 1 6 2 8 3 9 4 12 5 4 6 13 7 7 8 6 9 9 10 6 11 14 12 6 Total 100 1 of 8 1. [6] (a) [3] What camera setting(s)

More information

Viewpoint Invariant Features from Single Images Using 3D Geometry

Viewpoint Invariant Features from Single Images Using 3D Geometry Viewpoint Invariant Features from Single Images Using 3D Geometry Yanpeng Cao and John McDonald Department of Computer Science National University of Ireland, Maynooth, Ireland {y.cao,johnmcd}@cs.nuim.ie

More information

BSB663 Image Processing Pinar Duygulu. Slides are adapted from Selim Aksoy

BSB663 Image Processing Pinar Duygulu. Slides are adapted from Selim Aksoy BSB663 Image Processing Pinar Duygulu Slides are adapted from Selim Aksoy Image matching Image matching is a fundamental aspect of many problems in computer vision. Object or scene recognition Solving

More information

EN1610 Image Understanding Lab # 4: Corners, Interest Points, Hough Transform

EN1610 Image Understanding Lab # 4: Corners, Interest Points, Hough Transform EN1610 Image Understanding Lab # 4: Corners, Interest Points, Hough Transform The goal of this fourth lab is to ˆ Learn how to detect corners, and use them in a tracking application ˆ Learn how to describe

More information

Discrete Optimization of Ray Potentials for Semantic 3D Reconstruction

Discrete Optimization of Ray Potentials for Semantic 3D Reconstruction Discrete Optimization of Ray Potentials for Semantic 3D Reconstruction Marc Pollefeys Joined work with Nikolay Savinov, Christian Haene, Lubor Ladicky 2 Comparison to Volumetric Fusion Higher-order ray

More information

Visual Odometry. Features, Tracking, Essential Matrix, and RANSAC. Stephan Weiss Computer Vision Group NASA-JPL / CalTech

Visual Odometry. Features, Tracking, Essential Matrix, and RANSAC. Stephan Weiss Computer Vision Group NASA-JPL / CalTech Visual Odometry Features, Tracking, Essential Matrix, and RANSAC Stephan Weiss Computer Vision Group NASA-JPL / CalTech Stephan.Weiss@ieee.org (c) 2013. Government sponsorship acknowledged. Outline The

More information

COMPARATIVE STUDY OF IMAGE EDGE DETECTION ALGORITHMS

COMPARATIVE STUDY OF IMAGE EDGE DETECTION ALGORITHMS COMPARATIVE STUDY OF IMAGE EDGE DETECTION ALGORITHMS Shubham Saini 1, Bhavesh Kasliwal 2, Shraey Bhatia 3 1 Student, School of Computing Science and Engineering, Vellore Institute of Technology, India,

More information

III. VERVIEW OF THE METHODS

III. VERVIEW OF THE METHODS An Analytical Study of SIFT and SURF in Image Registration Vivek Kumar Gupta, Kanchan Cecil Department of Electronics & Telecommunication, Jabalpur engineering college, Jabalpur, India comparing the distance

More information

Detecting motion by means of 2D and 3D information

Detecting motion by means of 2D and 3D information Detecting motion by means of 2D and 3D information Federico Tombari Stefano Mattoccia Luigi Di Stefano Fabio Tonelli Department of Electronics Computer Science and Systems (DEIS) Viale Risorgimento 2,

More information

Cs : Computer Vision Final Project Report

Cs : Computer Vision Final Project Report Cs 600.461: Computer Vision Final Project Report Giancarlo Troni gtroni@jhu.edu Raphael Sznitman sznitman@jhu.edu Abstract Given a Youtube video of a busy street intersection, our task is to detect, track,

More information

COMPARATIVE STUDY OF DIFFERENT APPROACHES FOR EFFICIENT RECTIFICATION UNDER GENERAL MOTION

COMPARATIVE STUDY OF DIFFERENT APPROACHES FOR EFFICIENT RECTIFICATION UNDER GENERAL MOTION COMPARATIVE STUDY OF DIFFERENT APPROACHES FOR EFFICIENT RECTIFICATION UNDER GENERAL MOTION Mr.V.SRINIVASA RAO 1 Prof.A.SATYA KALYAN 2 DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING PRASAD V POTLURI SIDDHARTHA

More information

Determinant of homography-matrix-based multiple-object recognition

Determinant of homography-matrix-based multiple-object recognition Determinant of homography-matrix-based multiple-object recognition 1 Nagachetan Bangalore, Madhu Kiran, Anil Suryaprakash Visio Ingenii Limited F2-F3 Maxet House Liverpool Road Luton, LU1 1RS United Kingdom

More information

Midterm Examination CS 540-2: Introduction to Artificial Intelligence

Midterm Examination CS 540-2: Introduction to Artificial Intelligence Midterm Examination CS 54-2: Introduction to Artificial Intelligence March 9, 217 LAST NAME: FIRST NAME: Problem Score Max Score 1 15 2 17 3 12 4 6 5 12 6 14 7 15 8 9 Total 1 1 of 1 Question 1. [15] State

More information

CAP 5415 Computer Vision Fall 2012

CAP 5415 Computer Vision Fall 2012 CAP 5415 Computer Vision Fall 01 Dr. Mubarak Shah Univ. of Central Florida Office 47-F HEC Lecture-5 SIFT: David Lowe, UBC SIFT - Key Point Extraction Stands for scale invariant feature transform Patented

More information

Linear combinations of simple classifiers for the PASCAL challenge

Linear combinations of simple classifiers for the PASCAL challenge Linear combinations of simple classifiers for the PASCAL challenge Nik A. Melchior and David Lee 16 721 Advanced Perception The Robotics Institute Carnegie Mellon University Email: melchior@cmu.edu, dlee1@andrew.cmu.edu

More information

Unsupervised Learning and Clustering

Unsupervised Learning and Clustering Unsupervised Learning and Clustering Selim Aksoy Department of Computer Engineering Bilkent University saksoy@cs.bilkent.edu.tr CS 551, Spring 2009 CS 551, Spring 2009 c 2009, Selim Aksoy (Bilkent University)

More information

Towards a visual perception system for LNG pipe inspection

Towards a visual perception system for LNG pipe inspection Towards a visual perception system for LNG pipe inspection LPV Project Team: Brett Browning (PI), Peter Rander (co PI), Peter Hansen Hatem Alismail, Mohamed Mustafa, Joey Gannon Qri8 Lab A Brief Overview

More information

Unsupervised Learning and Clustering

Unsupervised Learning and Clustering Unsupervised Learning and Clustering Selim Aksoy Department of Computer Engineering Bilkent University saksoy@cs.bilkent.edu.tr CS 551, Spring 2008 CS 551, Spring 2008 c 2008, Selim Aksoy (Bilkent University)

More information

Robot localization method based on visual features and their geometric relationship

Robot localization method based on visual features and their geometric relationship , pp.46-50 http://dx.doi.org/10.14257/astl.2015.85.11 Robot localization method based on visual features and their geometric relationship Sangyun Lee 1, Changkyung Eem 2, and Hyunki Hong 3 1 Department

More information

Using Geometric Blur for Point Correspondence

Using Geometric Blur for Point Correspondence 1 Using Geometric Blur for Point Correspondence Nisarg Vyas Electrical and Computer Engineering Department, Carnegie Mellon University, Pittsburgh, PA Abstract In computer vision applications, point correspondence

More information

Harder case. Image matching. Even harder case. Harder still? by Diva Sian. by swashford

Harder case. Image matching. Even harder case. Harder still? by Diva Sian. by swashford Image matching Harder case by Diva Sian by Diva Sian by scgbt by swashford Even harder case Harder still? How the Afghan Girl was Identified by Her Iris Patterns Read the story NASA Mars Rover images Answer

More information

Part-Based Skew Estimation for Mathematical Expressions

Part-Based Skew Estimation for Mathematical Expressions Soma Shiraishi, Yaokai Feng, and Seiichi Uchida shiraishi@human.ait.kyushu-u.ac.jp {fengyk,uchida}@ait.kyushu-u.ac.jp Abstract We propose a novel method for the skew estimation on text images containing

More information

Previously. Part-based and local feature models for generic object recognition. Bag-of-words model 4/20/2011

Previously. Part-based and local feature models for generic object recognition. Bag-of-words model 4/20/2011 Previously Part-based and local feature models for generic object recognition Wed, April 20 UT-Austin Discriminative classifiers Boosting Nearest neighbors Support vector machines Useful for object recognition

More information

Hierarchical Multi-Affine (HMA) algorithm for fast and accurate feature matching in minimally-invasive surgical images

Hierarchical Multi-Affine (HMA) algorithm for fast and accurate feature matching in minimally-invasive surgical images Hierarchical Multi-Affine (HMA) algorithm for fast and accurate feature matching in minimally-invasive surgical images Gustavo A. Puerto-Souza, and Gian Luca Mariottini Abstract The ability to find similar

More information

Features Points. Andrea Torsello DAIS Università Ca Foscari via Torino 155, Mestre (VE)

Features Points. Andrea Torsello DAIS Università Ca Foscari via Torino 155, Mestre (VE) Features Points Andrea Torsello DAIS Università Ca Foscari via Torino 155, 30172 Mestre (VE) Finding Corners Edge detectors perform poorly at corners. Corners provide repeatable points for matching, so

More information

URBAN STRUCTURE ESTIMATION USING PARALLEL AND ORTHOGONAL LINES

URBAN STRUCTURE ESTIMATION USING PARALLEL AND ORTHOGONAL LINES URBAN STRUCTURE ESTIMATION USING PARALLEL AND ORTHOGONAL LINES An Undergraduate Research Scholars Thesis by RUI LIU Submitted to Honors and Undergraduate Research Texas A&M University in partial fulfillment

More information

10/03/11. Model Fitting. Computer Vision CS 143, Brown. James Hays. Slides from Silvio Savarese, Svetlana Lazebnik, and Derek Hoiem

10/03/11. Model Fitting. Computer Vision CS 143, Brown. James Hays. Slides from Silvio Savarese, Svetlana Lazebnik, and Derek Hoiem 10/03/11 Model Fitting Computer Vision CS 143, Brown James Hays Slides from Silvio Savarese, Svetlana Lazebnik, and Derek Hoiem Fitting: find the parameters of a model that best fit the data Alignment:

More information

Local Features Tutorial: Nov. 8, 04

Local Features Tutorial: Nov. 8, 04 Local Features Tutorial: Nov. 8, 04 Local Features Tutorial References: Matlab SIFT tutorial (from course webpage) Lowe, David G. Distinctive Image Features from Scale Invariant Features, International

More information

Stereo Vision. MAN-522 Computer Vision

Stereo Vision. MAN-522 Computer Vision Stereo Vision MAN-522 Computer Vision What is the goal of stereo vision? The recovery of the 3D structure of a scene using two or more images of the 3D scene, each acquired from a different viewpoint in

More information

Multiple-Choice Questionnaire Group C

Multiple-Choice Questionnaire Group C Family name: Vision and Machine-Learning Given name: 1/28/2011 Multiple-Choice naire Group C No documents authorized. There can be several right answers to a question. Marking-scheme: 2 points if all right

More information

Recognizing Apples by Piecing Together the Segmentation Puzzle

Recognizing Apples by Piecing Together the Segmentation Puzzle Recognizing Apples by Piecing Together the Segmentation Puzzle Kyle Wilshusen 1 and Stephen Nuske 2 Abstract This paper presents a system that can provide yield estimates in apple orchards. This is done

More information

Bridging the Gap Between Local and Global Approaches for 3D Object Recognition. Isma Hadji G. N. DeSouza

Bridging the Gap Between Local and Global Approaches for 3D Object Recognition. Isma Hadji G. N. DeSouza Bridging the Gap Between Local and Global Approaches for 3D Object Recognition Isma Hadji G. N. DeSouza Outline Introduction Motivation Proposed Methods: 1. LEFT keypoint Detector 2. LGS Feature Descriptor

More information

Automatic Image Alignment (feature-based)

Automatic Image Alignment (feature-based) Automatic Image Alignment (feature-based) Mike Nese with a lot of slides stolen from Steve Seitz and Rick Szeliski 15-463: Computational Photography Alexei Efros, CMU, Fall 2006 Today s lecture Feature

More information

Discovering Visual Hierarchy through Unsupervised Learning Haider Razvi

Discovering Visual Hierarchy through Unsupervised Learning Haider Razvi Discovering Visual Hierarchy through Unsupervised Learning Haider Razvi hrazvi@stanford.edu 1 Introduction: We present a method for discovering visual hierarchy in a set of images. Automatically grouping

More information

Decision Trees Dr. G. Bharadwaja Kumar VIT Chennai

Decision Trees Dr. G. Bharadwaja Kumar VIT Chennai Decision Trees Decision Tree Decision Trees (DTs) are a nonparametric supervised learning method used for classification and regression. The goal is to create a model that predicts the value of a target

More information

Endoscopic Reconstruction with Robust Feature Matching

Endoscopic Reconstruction with Robust Feature Matching Endoscopic Reconstruction with Robust Feature Matching Students: Xiang Xiang Mentors: Dr. Daniel Mirota, Dr. Gregory Hager and Dr. Russell Taylor Abstract Feature matching based 3D reconstruction is a

More information

Unsupervised Learning

Unsupervised Learning Outline Unsupervised Learning Basic concepts K-means algorithm Representation of clusters Hierarchical clustering Distance functions Which clustering algorithm to use? NN Supervised learning vs. unsupervised

More information

LOCAL AND GLOBAL DESCRIPTORS FOR PLACE RECOGNITION IN ROBOTICS

LOCAL AND GLOBAL DESCRIPTORS FOR PLACE RECOGNITION IN ROBOTICS 8th International DAAAM Baltic Conference "INDUSTRIAL ENGINEERING - 19-21 April 2012, Tallinn, Estonia LOCAL AND GLOBAL DESCRIPTORS FOR PLACE RECOGNITION IN ROBOTICS Shvarts, D. & Tamre, M. Abstract: The

More information

Fully Automatic Endoscope Calibration for Intraoperative Use

Fully Automatic Endoscope Calibration for Intraoperative Use Fully Automatic Endoscope Calibration for Intraoperative Use Christian Wengert, Mireille Reeff, Philippe C. Cattin, Gábor Székely Computer Vision Laboratory, ETH Zurich, 8092 Zurich, Switzerland {wengert,

More information

A NEW FEATURE BASED IMAGE REGISTRATION ALGORITHM INTRODUCTION

A NEW FEATURE BASED IMAGE REGISTRATION ALGORITHM INTRODUCTION A NEW FEATURE BASED IMAGE REGISTRATION ALGORITHM Karthik Krish Stuart Heinrich Wesley E. Snyder Halil Cakir Siamak Khorram North Carolina State University Raleigh, 27695 kkrish@ncsu.edu sbheinri@ncsu.edu

More information

Using temporal seeding to constrain the disparity search range in stereo matching

Using temporal seeding to constrain the disparity search range in stereo matching Using temporal seeding to constrain the disparity search range in stereo matching Thulani Ndhlovu Mobile Intelligent Autonomous Systems CSIR South Africa Email: tndhlovu@csir.co.za Fred Nicolls Department

More information

CHAPTER 6 MODIFIED FUZZY TECHNIQUES BASED IMAGE SEGMENTATION

CHAPTER 6 MODIFIED FUZZY TECHNIQUES BASED IMAGE SEGMENTATION CHAPTER 6 MODIFIED FUZZY TECHNIQUES BASED IMAGE SEGMENTATION 6.1 INTRODUCTION Fuzzy logic based computational techniques are becoming increasingly important in the medical image analysis arena. The significant

More information

Vision-based Mobile Robot Localization and Mapping using Scale-Invariant Features

Vision-based Mobile Robot Localization and Mapping using Scale-Invariant Features Vision-based Mobile Robot Localization and Mapping using Scale-Invariant Features Stephen Se, David Lowe, Jim Little Department of Computer Science University of British Columbia Presented by Adam Bickett

More information

University of Florida CISE department Gator Engineering. Clustering Part 4

University of Florida CISE department Gator Engineering. Clustering Part 4 Clustering Part 4 Dr. Sanjay Ranka Professor Computer and Information Science and Engineering University of Florida, Gainesville DBSCAN DBSCAN is a density based clustering algorithm Density = number of

More information

Visuelle Perzeption für Mensch- Maschine Schnittstellen

Visuelle Perzeption für Mensch- Maschine Schnittstellen Visuelle Perzeption für Mensch- Maschine Schnittstellen Vorlesung, WS 2009 Prof. Dr. Rainer Stiefelhagen Dr. Edgar Seemann Institut für Anthropomatik Universität Karlsruhe (TH) http://cvhci.ira.uka.de

More information

Feature Detectors and Descriptors: Corners, Lines, etc.

Feature Detectors and Descriptors: Corners, Lines, etc. Feature Detectors and Descriptors: Corners, Lines, etc. Edges vs. Corners Edges = maxima in intensity gradient Edges vs. Corners Corners = lots of variation in direction of gradient in a small neighborhood

More information