CSC494: Individual Project in Computer Science

Size: px
Start display at page:

Download "CSC494: Individual Project in Computer Science"

Transcription

1 CSC494: Individual Project in Computer Science Seyed Kamyar Seyed Ghasemipour Department of Computer Science University of Toronto Abstract One of the most important tasks in computer vision is object recognition. In this work we explore to what extent local descriptors computed at keypoints can be useful for this task. We experiment with different combinations of keypoint detectors and local feature descriptors on a dataset of point clouds of rendered CAD models as well as a dataset of point clouds of individual object instances segmented out from RGB-D images of natural scenes. Our results demonstrate that by jointly taking into account the features computed from an object, a simple nearest neighbour classification framework can result in interesting classification performance. 1. Introduction One of the most important components of an object recognition pipeline is the representations of the objects that are used. The most successful methods in this area iteratively build more complex, higher order representations by combining lower order ones to create more semantic representations. In this report, on the other hand, we explore the efficacy of using purely local feature descriptors, extracted at specific interest points, for the task of object recognition. fiin the task of recognizing a specific object instance across multiple scenes, in the presence of clutter, and with variations in viewpoint, a standard pipeline is to employ a keypoint detection algorithm to find salient, repeatable interest points on the object, and then proceed to encode, in a sufficiently unique manner, the detected interest points and their surrounding neighborhoods using a feature descriptor. At test time, given a novel scene, the features of the objects of interest are compared against features which were extracted from the scene in a similar manner. Given sufficient consensus between the relative locations of the matched features in the scene and the ones on a previously seen object, the object is noted to have been discovered in the scene. In this work, we explore to what extent this pipeline can be used for object recognition. We experiment with combinations of two keypoint detectors and two feature descriptors on two different datasets: a dataset of point clouds generated from CAD models of 10 common household objects, and a dataset of point clouds of the same object classes extracted from natural indoor scenes. 2. Datasets As mentioned above, we experiment with artificial as well as natural data. This allows us to more effectively analyze the ability of our approach; the artificial set provides an easy, noiseless benchmark for testing our method and the natural, noisy set gauges transferability to real-world scenarios. Both datasets consist of point clouds of individual object instances from 10 classes of household objects. The objects considered in our experiments are: bathtub, bed, chair, desk, dresser, monitor, night stand, sofa, table, and toilet. We proceed to briefly describe the method of generation of each dataset Artificial Data To generate our artificial data, we made use of a collection of CAD models (of objects from the aforementioned classes) gathered by Wu et al. [7]. To simulate having a 2.5D point cloud, given each CAD model, a uniform grid of 12 points was placed on a sphere centered at the given object [8]. Subsequently, the object was rendered from the viewpoint of cameras placed on the grid points, looking towards the center of the sphere. The depth buffer of each of the renderings was then converted to a point cloud to generate our artificial dataset. In our experiments with the artificial data, we test our object recognition method for two scenarios: recognizing previously seen objects from novel viewpoints and recognizing novel object instances. For the purposes of our experiments, we created three data splits: Train Set: To create the training split, for each object class 20 instances were chosen, and for each instance, 4 of the 12 views were picked at random. 1

2 New Views Test Set: This set was created by randomly choosing 2 other views for the instances in the training set. Novel Objects Test Set: The test set of novel object instances was created by randomly choosing 4 of 12 views for 5 different instances of each object category Natural Data In our experiments, we also made use of the NYU Depth Dataset V2 [2]. This dataset consists of RGB-D images of various indoor scenes, with instance-level segmentations of the objects in the images. Using these segmentation annotations, we separated out instances of the objects of interest from the images and converted them into point clouds. The train and test sets for experiments with natural data contained 12 and 3 instances per object category respectively. An important issue in dealing with point clouds of objects extracted from natural cluttered scenes is that they can be highly occluded to the extent that they contain no distinguishable characteristics of the object categories they belong to. Although occlusion is an important challenge in computer vision that needs to be dealt with, our framework processes object instances in isolation, and as a result, is not able to work with high degrees of occlusion (more on the importance of context in Section 8). Therefore, the point clouds used in our experiments were chosen by manually sifting through the extracted data and choosing those that bared resemblance to the object categories they were meant to represent. Figure 1 shows samples of object instances used in our experiments as well as examples of point clouds that were dismissed due to not being representative of their classes. our point clouds, the mean and the standard deviation of the average distance of points to their 50 nearest neighbours were computed for each cloud. Points whose computed values were outside one standard deviation from the mean were marked as outliers and removed from the point clouds. Figure 2 shows an example point cloud before and after the removal of outliers. 3. Keypoint Detectors Keypoints in images or point clouds are points which are deemed to be more interesting than other points relative to a set of criteria. In the task of detecting the same object across different scenes, the most desirable property of keypoints is their repeatability; this ensures that the same keypoints can be detected and potentially matched across different viewpoints and scenes. Towards the goal of repeatability, most keypoint detection algorithms analyze the distribution of geometric attributes in a local neighbourhood and choose points where these distributions have high variance. This allows these algorithms to find regions with nongeneric and interesting geometric structures. Under the pretense that local surface features are informative for general object recognition, keypoints would also play an important role from a computational point of view. Computing local surface features at every point in a point cloud is an expensive process. Furthermore, at test time, given a new object, analyzing and comparing its features to those that have been previously seen can become an arduous task if non-parametric methods (such as k-nearestneighbours) are used. In the following sections, we present the keypoint detection algorithms that we compared in our experiments Intrinsic Shape Signatures Keypoint Detectors In [9], Zhong presents a 3D shape descriptor called the Intrinsic Shape Signature (ISS). One of the keypoint detection methods that we consider is the one used in the interest point extraction step of ISS. Figure 1: Samples from chair, table, and desk categories (Green: used in dataset, Red: dismissed) Data Preprocessing Either due to measurement errors or simply due to the angle of a surface with respect to the camera, point clouds can contain stray outlier points. To remove these points from Method The salience of a given point is determined by analyzing the eigenvalues of the scatter matrix of the points in its neighbourhood. More formally, let r density denote a radius used for estimating the density of points around a point of interest, p i, and r nbhd denote a radius used for determining its 1 salience. Additionally, let w i = {p j: p j p i <r density } represent a measure of density. The procedure for determining whether p i is a keypoint is as follows: 2

3 Figure 2: An example point cloud before and after outlier removal. 1. The weighted scatter matrix about p i is computed as: w j (p j p i )(p j p i SC(p i ) = )T p j p i <r nbhd p j p i <r nbhd w j 2. The eigenvalues of SC(p i ) are computed next. λ 1 i, λ 2 i, λ3 i represent the eigenvalues in order of decreasing magnitude. 3. Points where λ2 i < γ λ 1 21 and λ3 i < γ i λ 2 32 are chosen as the i potential set of keypoints (γ 21 and γ 32 are parameters that can be tuned). 4. The final set of keypoints are determined by nonmaximal suppression using the magnitude of λ 3 i Intuition The eigenvalues of SC(p i ) represent to what extent the positions of points in the local p i neighbourhood vary along three orthogonal axes. Therefore the behaviour or the keypoint detection method outlined above depends heavily on the values of γ 21 and γ 32. If γ 21 and γ 32 are both large, then all points pass through step 3, and the resulting set of keypoints will contain points whose neighbours are scattered in every direction. However, on flat surfaces, such as the bottom of a chair, the magnitude of the third eigenvector is very small in comparison to the first two. In such regions, as a result of the non-maximal suppression step, the algorithm will behave similarly to uniform sampling of points. Setting γ 21 and γ 32 to be too small also have side-effects. If γ 32 is small, the keypoints will capture flat surfaces which is not a desirable property. However, making γ 21 small will help to capture edges since at a point on a 3-dimensional edge, there will typically be two directions with neighbours on only one side, and a third direction with neighbours on both sides of the points. Lastly, we observe that it would be difficult for this method to detect corners. At corners, all three eigenvalues would have similar magnitudes. Therefore, unless γ 21 and γ 32 are both set to be large, corners will not be detected as keypoints. Indeed, the first image in Figure 3 shows that setting γ 21 and γ 32 to 0.7 and 0.5 respectively, the algorithm did not choose the corners of the table as keypoints Harris 3D Corner Detector The Harris corner detector is a popular method of extracting interest points from 2D images. One of the keypoint detectors that we use in our experiments is the extension of the Harris corner detector to 3D point clouds Method denote the gradients in the x and y directions at the point p i in a 2D image. In the original Harris corner detection algorithm, the magnitude of the eigenvalues of: Let I (j) x and I (j) y M(p i ) = [I x (j) p j p i <r nbhd, I (j) y ][I (j) x, I y (j) ] T are considered to be indicative of the existence of an edge or corner at p i. The extension of this idea to the case of 3D point clouds, as implemented in the Point Cloud Library (PCL) [4] is to replace the matrix M with the covariance of the normals at the points in the neighbourhood of p i : Intuition COV (p i ) = p j p i <r density N j N T j The method described in 3.1 cares about the spatial distribution of points in a local neighbourhood. However, this information does not say much about the curvature of the 3

4 Figure 3: Keypoints detected on a table (left: ISS, right: Harris). Figure 4: Keypoints detected on a chair using different methods (in order from left to right: ISS, Harris, Uniform). local region. On the other hand, the Harris 3D corner detector does consider this information by caring about to what extent the direction of the surface normals vary along different orthogonal directions. A comparison of keypoints detected with the two methods discussed thus far can be seen in Figures 3 and Uniform Sampling As mentioned before, keypoint detection methods tend to care about non-generic regions with interesting structures. However, for building representations of objects, high variance regions are not the only informative regions. The existence and the distribution of smooth and flat regions could also be valuable information. Hence, as a baseline to the two keypoint detection techniques mentioned above, we also experiment with uniformly sampling points from our point clouds as a replacement for interest point detection algorithms Practical Notes & Implementation Details Two important parameters that need to be set for the computation of ISS keypoints are r nbhd and the radius of non-maxima suppression. Data from our point clouds can contain a significant amount of noise (this is especially true for natural data). Therefore, we would not want these radii to be too small. After some parameter tuning, we decided to set r nbhd and the radius of non-maximal suppression to be respectively 12 and 8 times the model resolution, where the model resolution of a point cloud is computed as the average distance of a point to its nearest neighbour in the cloud. For the Harris keypoint detector, we set the radius used for performing the computations to be 8 times the model resolution. Additionally, this detector also requires surface normals to be computed. If the support radius used to compute the normals is too small, the computations will be susceptible to noise. On the other hand, if this radius is too large, the normals inside a local region will be very similar to one another, thereby negatively affecting the keypoint detection process. Eventually, we decided to set this parameter to be 3 times the model resolution. For uniform sampling, we randomly sampled points from the point cloud for each object Statistics The table below presents the mean ratio of number of keypoints detected to the number of points in the point clouds for objects in the training sets of each dataset. The 4

5 Dataset ISS Harris 3D Uniform Natural Data Artificial Data Table 1: Ratio of keypoints to number of points in point cloud. larger ratios for the natural dataset is a result of the noisier nature of the data. 4. Local Descriptors After performing keypoint extraction from the data of interest, the next step in our experimental pipeline is to compute local descriptors at the salient points. The role of a descriptor computed at a given point is to represent the properties of the local surface around the point in a compact, yet sufficiently unique manner. Below, we discuss two local descriptors that we used in our experiments Fast Point Feature Histograms In [5], Rusu et al. present the Point Feature Histogram (PFH) as means of capturing the geometrical properties of the neighbourhood of a point. In later work, [3], presents a modification of PFH named Fast Point Feature Histogram (FPFH) which significantly reduces the computational cost associated with the feature computation. We proceed by first describing the how PFH are computed and explain how it has been modified to produce the FPFH. Figure 5: Visualization of local frame and angles used for the computation of Point Feature Histograms [1]. 4. Given this frame, the quadruplet α, φ, θ, d for the pair (p s, p t ) is formed where: d = p t p s α = v n t φ = u (p t p s ) d θ = arctan(w n t, u n t ) 5. For 2.5D images, however, the distance between neighbouring points differs across viewpoints. Therefore, it is common to eliminate the d element from the quadruplet. To create the PFH descriptor for the point p q, the triplets α, φ, θ are computed for every pair of points in S(p q ). α, φ, and θ are each binned using 5 bins, creating a total of 5 3 bins for the triplets. The Point Feature Histogram for p q is then taken to be the 125-dimensional histogram of the computed triplets Point Feature Histograms Point Feature Histograms attempt to capture the geometry of a local region by taking into account the relative direction of the normals in that region. Given a point of interest, p q, the PFH for that point is computed as follows: 1. Let S(p q ) = {p t : p q p t < r nbhd } where r nbhd is a hyperparameter of the feature computation. 2. For a given pair of points (p s, p t ) in S(p q ), with normals (n s, n t ), the point whose normal makes the smallest angle with p s p t is chosen as the source. 3. Without loss of generality, we assume the source to be p s. A frame about the point p s is created using the orthonormal basis (u, v, w) defined as follows: u = n s v = (p t p s ) p t p s w = u v Fast Point Feature Histograms Computing Point Feature Histograms incurs a high computational cost due to the fact that the mentioned triplets are computed for every pair of points in the neighbourhood of the point we are considering. The Fast Point Feature Histogram [3] attempts to remedy this issue as follows: 1. In the first step, Simplified Point Feature Histograms (SPFH) are computed at every point in the cloud. For a given point p q, SPFH is computed in a very similar manner as PFH with 2 main differences: The triplets α, φ, θ are only computed between p q and its neighbours. Instead of jointly binning the values of the triplets, a histogram of 11 bins is made for each of α, φ, and θ separately, and the resultant histograms are concatenated to form a 33- dimensional feature vector. 5

6 2. In the second step, the FPFH feature for point p q is computed to be: F P F H(p q ) = SP F H(p q ) 1 + S(p q ) p k S(p q) where w k is the distance of p q to p k Intuitions 1 w k SP F H(p k ) FPFH attempts to capture the geometry of a local region by measuring how the direction of the normals in the region change relative to one another. However, without information about the relative location of the normals, the same descriptor could potentially represent many types of surface geometries. This, in addition to the fact that α, φ, and θ are binned independently raises a concern about how distinctive FPFH features are in terms of representing surfaces with different structures. On a positive note however, the fact that FPFH features only consider the relative directions of normals means that FPFH features are pose invariant and should in theory produce the same histograms when objects are seen from different viewpoints SHOT Descriptor In [6], Tombari et al. present the Signature of Histograms of Orientations (SHOT). This is the second descriptor that we employed in our experiments Method To compute the descriptor at a point p i, the SHOT descriptor first necessitates the computation of a local reference frame. This reference frame is computed at follows: 1. Let M be: M = j:d j r nbhd (r nbhd d j )(p i p j )(p i p j ) T j:d j r nbhd (r nbhd d j ) where r nbhd is a hand tuned parameter. 2. The directions of the eigenvectors sorted in decreasing order of eigenvalue magnitude are taken in order to be the directions of the x, y, and z axes of the local reference frame. We will denote these eigenvectors with x, y, z. 3. Let S x + = {j : d j < r nbhd (p j p) x 0} and Sx = {j : d j < r nbhd (p j p) ( x) > 0}. The positive direction of the x axis for the reference frames Figure 6: Spherical grid used in the computation of the SHOT descriptor. is set to be the direction of x if S x + > Sx and x otherwise. This essentially means that the direction that contains the most number of points is considered to be the positive direction. 4. The positive direction of the other axes are determined in a similar fashion. Given the computed local reference frame at p i, the SHOT descriptor is computed as follows: 1. A spherical grid similar to the one shown in Figure 6 is placed centered at p i. The spherical grid has 8 divisions along the azimuth, 2 divisions along the elevation, and 2 divisions of the distance of a point to the center of the sphere. 2. For each division in the grid independently, a histogram is created by binning the values of cos(θ j ), where θ j is the angle between the surface normal at a point p j inside the division and the surface normal at p i. 11 bins are used for this computation. 3. The computed histograms from the divisions are concatenated together and the resulting vector is normalized to so that its components have a sum of one Intuitions The method outlined above attempts to creates a more fine grained descriptor. There is however a significant amount of ambiguity that results from solely binning the values of cos(θ j ). Although θ j tells us to what extent the normals deviate from the normal at p i, it does not tell us in which direction it deviates. This is a significant amount of ambiguity and many different types of surfaces could potentially produce the same descriptor. 6

7 4.3. Practical Notes & Implementation Details To use the local descriptors for the purpose of object recognition, we decided to choose a not so small value for r nbhd so as to capture the geometry of a larger region and deal with the presence of noise. We decided to set this value to 12 times the resolution of the point clouds. Similar to the case when detecting keypoints, the support radius for normal computation was set to 3 times the resolution of the point clouds. There also exists a practical issue when working with SHOT descriptors; they are very high dimensional descriptors (352 for SHOT vs. 33 for FPFH). This creates a computational problem for performing nearest neighbour queries. To resolve this issue, using PCA, 30 highly informative orthogonal axes of the SHOT descriptors were identified by looking at the training set. Subsequently, all SHOT descriptors from the training and test sets were preprocessed by being projected onto the derived axes (this was done for each dataset independently). 5. A Priori Expectations A priori, we do not expect the method we employ in this work to produce amazing results. To begin with, the features that we extract are local surface descriptors. It is quite possible to significantly modify the local structure of points in a point cloud (for example by adding sufficient amounts of noise) while still preserving a global structure that allows for the recognition of the object by a human, but not by our method. Additionally, in our framework, we are not taking into account the relative positions of the keypoints at which we extract features. Even by encoding this information, we would still be required to deal with the ambiguities associated with working with rotation invariant feature descriptors. However, if we observe positive results in our experiments, this will indicate that the local features we extract are able to represent non-trivial aspects of the objects they are derived from, providing motivation for future work to attempt to incorporate this information with more global properties in order to build better representations of objects. 6. Experiments In this section we discuss the experiments we carried out using our pipeline. As mentioned in section 2, we worked with both an artificial and a natural dataset. For the artificial dataset we experimented with recognizing previously seen objects from new viewpoints in addition to recognizing novel objects. For the natural dataset however, we only experiment with recognizing previously unseen objects from the given object categories Distinctiveness of Individual Keypoints As a first experiment, it would be interesting to explore to what extent features extracted from objects are indicative of the class of the object. To this effect, we performed two tests: k-nn If individual features extracted at keypoints are representative of the object category, nearest neighbours classification of the features should produce results better than chance. For a given value of k, nearest neighbours classification was done by first determining the k nearest neighbours to a query point, q, and creating a voting mechanism in which the neighbours vote for their object class using a weight inversely proportional to their distance from q. The plots in Figure 9 demonstrate the classification accuracy of k-nn for k {1, 3, 5, 7, 9} on both datasets for every combination of keypoint extractor and feature descriptor. First, the fact that we obtain results consistently better than chance indicates that there may indeed exist information that can be leveraged from the local descriptors. Interestingly, the plot for the new views test set of the artificial data seems to produce a strict ordering of the quality of the keypoint/descriptor combinations. Since good performance on this test set would require the repeatability of keypoints and distinctiveness of the descriptors, the results indicate that the ISS keypoints are better repeatable than Harris corners, and that FPFH features better describe the local surfaces of the objects. On novel object test sets for both datasets however, uniform methods achieve close performance to keypoint-based ones and the benefit of having keypoints diminishes. Lastly, we note that FPFH based methods do not achieve 100% accuracy for 1-NN on the training sets. This is due to the fact that FPFH features for various points end up having the same description. This is related to our previous concern regarding the distinctiveness of FPFH but we do not know how to reconcile this with their superior performance discussed in the previous paragraph k-means If the extracted features are indicative of class, after performing k-means clustering on the descriptors, the distribution of class labels in each cluster should be non-uniform and heavily skewed. To perform this analysis, for each dataset, the training set was clustered using k-means clustering with 50 clusters. Subsequently, the features from the test sets were assigned to the cluster with the closest center. The figures on pages 9 and 10 demonstrate the distribution of class labels inside the clusters for the various combinations of keypoints and descriptors. In the visualizations, 7

8 Artificial Data Natural Data Figure 9: k-nn accuracy for classification of individual features columns indicate class and rows indicate the clusters. The brightness of pixels indicates the proportion of data in that cluster belonging to the particular class. The results show that for the artificial dataset, some structure is retained in the distribution patterns across the train, new views test, and novel instances test set. Also, the images for SHOT tend to be brighter than those for FPFH which indicates that the clusters from SHOT descriptors have a more uniform distribution. For the natural dataset however, the distributions of class labels inside the clusters do not stay the same between the train and test set. This is not unexpected since in natural data, due to significant amounts of noise, repeatability of features is quite hindered Vote Aggregation Individual features extracted from point clouds are susceptible to noise, and as can be seen in the results from 6.1.1, classification using these features independently, although above chance, does not produce very good results. 8

9 FPFH + ISS FPFH + Harris FPFH + Uniform SHOT + ISS SHOT + Harris SHOT + Uniform Figure 16: Distribution of labels in each of 50 cluster for TRAIN set of ARTIFICIAL DATA. FPFH + ISS FPFH + Harris FPFH + Uniform SHOT + ISS SHOT + Harris SHOT + Uniform Figure 23: Distribution of labels in each of 50 cluster for NEW VIEWS TEST set of ARTIFICIAL DATA. FPFH + ISS FPFH + Harris FPFH + Uniform SHOT + ISS SHOT + Harris SHOT + Uniform Figure 30: Distribution of labels in each of 50 cluster for NOVEL INSTANCES TEST set of ARTIFICIAL DATA. In this experiment instead, the individual features of an object voted for a class determined by the k-nn classification decision. The majority vote of the predicted classes was taken as the prediction for the full object. Figure 47 presents the results obtained from this experiment. At first glance, classification performance for the artificial dataset is improved very significantly whereas for natural data, the results are a mix of performance gain and loss for different types of keypoint-descriptor pairs. Looking at the results obtained from the various test sets, what is quite odd is that the relative ordering of how well different keypoint-descriptor combinations perform is not preserved; for the artificial dataset, the SHOT descriptor is actually doing a better job than FPFH, whereas the case was the reverse when classifying individual features. We do not have a justification for why this may have occurred. One aspect that is preserved between the results from and here is the fact that keypoint based feature computations results in noticeably better accuracies for only the new views test set of the artificial dataset, but the performance gap shrinks in when working when novel object instances are considered. 9

10 FPFH + ISS FPFH + Harris FPFH + Uniform SHOT + ISS SHOT + Harris SHOT + Uniform Figure 37: Distribution of labels in each of 50 cluster for TRAIN set of NATURAL DATA. FPFH + ISS FPFH + Harris FPFH + Uniform SHOT + ISS SHOT + Harris SHOT + Uniform Figure 44: Distribution of labels in each of 50 cluster for TEST set of NATURAL DATA Histogram of Histograms The last experiment that we did was to use a global representation of the objects derived from local feature descriptors. The representation that we used was computed as follows: 1. Features from the training set were clustered using k- means clustering with 50 clusters. 2. For objects in both train and test sets, a histogram with 50 bins was created where the bins represent the number of features from the given object that belong to a given cluster. 3. The histograms, normalized so that the values in the bins sum to one, are subsequently used as the representations of the objects. Classification using these representations was done in a similar fashion to experiment with the difference that here the data points are the computed histograms. The plots in Figure 50 present the classification results. This set of plots shares similar properties to the two previous ones from experiments and 6.2 (such as the keypoints being more relevant to the new views test set than for the novel objects test set) and the performance on the artificial dataset seems to be in between those of experiments and 6.2. However, the most interesting result from this experiment is that we were able to significantly improve classification accuracy on the natural data test set, which we were not able to do in experiment 6.2. This is quite surprising to us since the results from section were hinting that the distribution of prototypical features were changing significantly between the natural data train and test sets (further discussion in conclusions section). 7. Conclusions In this work, we explored to what extent local descriptors computed at keypoints can be useful for object recogntion. We experimented with with FPFH and SHOT descriptors computed at keypoints computed using 3 different methods, ISS Keypoints, Harris 3D Corners, and Uniform Sampling. To adapt this framework from instance detection to object recognition, we used larger radii of salience to capture information that is less local. Our experiments show that individual features on their own do not contain enough information for doing good classification. However, when the predictions for the individual features from an object are combined to make a judgement about object class, there is a very significant improve- 10

11 Artificial Data Natural Data Figure 47: k-nn accuracy for instance-level classification ment in performance for the artificial dataset. This, however, does not improve classification accuracy for natural data significantly. The reason could be that due to the significant amount of noise in natural data, keypoint detection does not perform well and descriptors do not represent the true surface shape. Additionally, we note that the similarity of accuracies on the novel objects artificial test set with or without the use of keypoints (keypoints vs. uniform sampling) could indicate that smooth regions are also informative of the class of objects. Lastly, an interesting result that we observed was that classification results for natural data are significantly improved in the experiment in section 6.3. The results on artificial data indicated that when considered jointly, the computed features can be useful for recognizing the class of an object. Representing objects using histograms of prototypical features could be considered as also doing the same with the added benefit of robustness to noise due to substituting features with their prototypes (cluster centers). Since natural data are noisy, the results on those test sets were im- 11

12 Artificial Data Natural Data Figure 50: k-nn accuracy for classification using normalized histogram of prototypical features proved whereas the results for artificial data were comparable to those obtained in experiment Future Directions The work in this report is applicable to a constrained situation. First, we required that object instances be separated. Segmenting out individual objects from cluttered scenes is a very difficult task and if we do not perform the segmentation, our feature computations will be inaccurate. Second, we only experimented with non-occluded (artificial data) or not-heavily-occluded (natural data) objects. Self-occlusion was present in our data however. This will also pose a major problem for us as our best results were achieved with the aggregation of votes from the features over the entire object. Another problem with our approach is that we treat objects in isolation. Context is extremely important in helping with classification, especially when significant degrees of occlusion come into play. For example, if we can recognize some chairs in a seen, then a flat plane near the chairs 12

13 would likely be a table. We would expect that modelling these interactions between object categories would be extremely valuable for object recognition. Lastly, in this work we showed that local features can be combined to capture discriminative properties of object categories. Taking this a step further would be to create a hierarchical representation of objects using features computed in a fashion similar to ours. Furthermore, one could envision a method in which local descriptors computed at keypoints are combined with representations of the smooth surfaces in an object to better capture the varying geometries of different object categories. References [1] Point feature histograms estimation documentation. [2] P. K. Nathan Silberman, Derek Hoiem and R. Fergus. Indoor segmentation and support inference from rgbd images. In ECCV, [3] R. B. Rusu. Semantic 3D Object Maps for Everyday Manipulation in Human Living Environments. PhD thesis, Computer Science department, Technische Universitaet Muenchen, Germany, October [4] R. B. Rusu and S. Cousins. 3D is here: Point Cloud Library (PCL). In IEEE International Conference on Robotics and Automation (ICRA), Shanghai, China, May [5] R. B. Rusu, Z. C. Marton, N. Blodow, and M. Beetz. Persistent point feature histograms for 3d point clouds. In Proc 10th Int Conf Intel Autonomous Syst (IAS-10), Baden-Baden, Germany, pages , [6] F. Tombari, S. Salti, and L. Di Stefano. Unique signatures of histograms for local surface description. In Computer Vision ECCV 2010, pages Springer, [7] Z. Wu, S. Song, A. Khosla, F. Yu, L. Zhang, X. Tang, and J. Xiao. 3d shapenets: A deep representation for volumetric shapes. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pages , [8] J. Xiao, T. Fang, P. Zhao, M. Lhuillier, and L. Quan. Imagebased street-side city modeling. In ACM Transactions on Graphics (TOG), volume 28, page 114. ACM, [9] Y. Zhong. Intrinsic shape signatures: A shape descriptor for 3d object recognition. In Computer Vision Workshops (ICCV Workshops), 2009 IEEE 12th International Conference on, pages IEEE,

Bridging the Gap Between Local and Global Approaches for 3D Object Recognition. Isma Hadji G. N. DeSouza

Bridging the Gap Between Local and Global Approaches for 3D Object Recognition. Isma Hadji G. N. DeSouza Bridging the Gap Between Local and Global Approaches for 3D Object Recognition Isma Hadji G. N. DeSouza Outline Introduction Motivation Proposed Methods: 1. LEFT keypoint Detector 2. LGS Feature Descriptor

More information

CRF Based Point Cloud Segmentation Jonathan Nation

CRF Based Point Cloud Segmentation Jonathan Nation CRF Based Point Cloud Segmentation Jonathan Nation jsnation@stanford.edu 1. INTRODUCTION The goal of the project is to use the recently proposed fully connected conditional random field (CRF) model to

More information

Robotics Programming Laboratory

Robotics Programming Laboratory Chair of Software Engineering Robotics Programming Laboratory Bertrand Meyer Jiwon Shin Lecture 8: Robot Perception Perception http://pascallin.ecs.soton.ac.uk/challenges/voc/databases.html#caltech car

More information

Efficient Surface and Feature Estimation in RGBD

Efficient Surface and Feature Estimation in RGBD Efficient Surface and Feature Estimation in RGBD Zoltan-Csaba Marton, Dejan Pangercic, Michael Beetz Intelligent Autonomous Systems Group Technische Universität München RGB-D Workshop on 3D Perception

More information

Local features: detection and description. Local invariant features

Local features: detection and description. Local invariant features Local features: detection and description Local invariant features Detection of interest points Harris corner detection Scale invariant blob detection: LoG Description of local patches SIFT : Histograms

More information

DeepIM: Deep Iterative Matching for 6D Pose Estimation - Supplementary Material

DeepIM: Deep Iterative Matching for 6D Pose Estimation - Supplementary Material DeepIM: Deep Iterative Matching for 6D Pose Estimation - Supplementary Material Yi Li 1, Gu Wang 1, Xiangyang Ji 1, Yu Xiang 2, and Dieter Fox 2 1 Tsinghua University, BNRist 2 University of Washington

More information

Volumetric and Multi-View CNNs for Object Classification on 3D Data Supplementary Material

Volumetric and Multi-View CNNs for Object Classification on 3D Data Supplementary Material Volumetric and Multi-View CNNs for Object Classification on 3D Data Supplementary Material Charles R. Qi Hao Su Matthias Nießner Angela Dai Mengyuan Yan Leonidas J. Guibas Stanford University 1. Details

More information

3D object recognition used by team robotto

3D object recognition used by team robotto 3D object recognition used by team robotto Workshop Juliane Hoebel February 1, 2016 Faculty of Computer Science, Otto-von-Guericke University Magdeburg Content 1. Introduction 2. Depth sensor 3. 3D object

More information

Filtering and mapping systems for underwater 3D imaging sonar

Filtering and mapping systems for underwater 3D imaging sonar Filtering and mapping systems for underwater 3D imaging sonar Tomohiro Koshikawa 1, a, Shin Kato 1,b, and Hitoshi Arisumi 1,c 1 Field Robotics Research Group, National Institute of Advanced Industrial

More information

Local invariant features

Local invariant features Local invariant features Tuesday, Oct 28 Kristen Grauman UT-Austin Today Some more Pset 2 results Pset 2 returned, pick up solutions Pset 3 is posted, due 11/11 Local invariant features Detection of interest

More information

Features Points. Andrea Torsello DAIS Università Ca Foscari via Torino 155, Mestre (VE)

Features Points. Andrea Torsello DAIS Università Ca Foscari via Torino 155, Mestre (VE) Features Points Andrea Torsello DAIS Università Ca Foscari via Torino 155, 30172 Mestre (VE) Finding Corners Edge detectors perform poorly at corners. Corners provide repeatable points for matching, so

More information

SUMMARY: DISTINCTIVE IMAGE FEATURES FROM SCALE- INVARIANT KEYPOINTS

SUMMARY: DISTINCTIVE IMAGE FEATURES FROM SCALE- INVARIANT KEYPOINTS SUMMARY: DISTINCTIVE IMAGE FEATURES FROM SCALE- INVARIANT KEYPOINTS Cognitive Robotics Original: David G. Lowe, 004 Summary: Coen van Leeuwen, s1460919 Abstract: This article presents a method to extract

More information

SYDE Winter 2011 Introduction to Pattern Recognition. Clustering

SYDE Winter 2011 Introduction to Pattern Recognition. Clustering SYDE 372 - Winter 2011 Introduction to Pattern Recognition Clustering Alexander Wong Department of Systems Design Engineering University of Waterloo Outline 1 2 3 4 5 All the approaches we have learned

More information

ECCV Presented by: Boris Ivanovic and Yolanda Wang CS 331B - November 16, 2016

ECCV Presented by: Boris Ivanovic and Yolanda Wang CS 331B - November 16, 2016 ECCV 2016 Presented by: Boris Ivanovic and Yolanda Wang CS 331B - November 16, 2016 Fundamental Question What is a good vector representation of an object? Something that can be easily predicted from 2D

More information

Patch-based Object Recognition. Basic Idea

Patch-based Object Recognition. Basic Idea Patch-based Object Recognition 1! Basic Idea Determine interest points in image Determine local image properties around interest points Use local image properties for object classification Example: Interest

More information

Corner Detection. GV12/3072 Image Processing.

Corner Detection. GV12/3072 Image Processing. Corner Detection 1 Last Week 2 Outline Corners and point features Moravec operator Image structure tensor Harris corner detector Sub-pixel accuracy SUSAN FAST Example descriptor: SIFT 3 Point Features

More information

Local features and image matching. Prof. Xin Yang HUST

Local features and image matching. Prof. Xin Yang HUST Local features and image matching Prof. Xin Yang HUST Last time RANSAC for robust geometric transformation estimation Translation, Affine, Homography Image warping Given a 2D transformation T and a source

More information

Local Features: Detection, Description & Matching

Local Features: Detection, Description & Matching Local Features: Detection, Description & Matching Lecture 08 Computer Vision Material Citations Dr George Stockman Professor Emeritus, Michigan State University Dr David Lowe Professor, University of British

More information

Obtaining Feature Correspondences

Obtaining Feature Correspondences Obtaining Feature Correspondences Neill Campbell May 9, 2008 A state-of-the-art system for finding objects in images has recently been developed by David Lowe. The algorithm is termed the Scale-Invariant

More information

Chapter 3 Image Registration. Chapter 3 Image Registration

Chapter 3 Image Registration. Chapter 3 Image Registration Chapter 3 Image Registration Distributed Algorithms for Introduction (1) Definition: Image Registration Input: 2 images of the same scene but taken from different perspectives Goal: Identify transformation

More information

Scale Invariant Feature Transform

Scale Invariant Feature Transform Scale Invariant Feature Transform Why do we care about matching features? Camera calibration Stereo Tracking/SFM Image moiaicing Object/activity Recognition Objection representation and recognition Image

More information

Implicit Shape Models For 3D Shape Classification With a Continuous Voting Space

Implicit Shape Models For 3D Shape Classification With a Continuous Voting Space Implicit Shape Models For 3D Shape Classification With a Continuous Voting Space Viktor Seib, Norman Link and Dietrich Paulus Active Vision Group (AGAS), University of Koblenz-Landau, Universitätsstr.

More information

Introduction. Introduction. Related Research. SIFT method. SIFT method. Distinctive Image Features from Scale-Invariant. Scale.

Introduction. Introduction. Related Research. SIFT method. SIFT method. Distinctive Image Features from Scale-Invariant. Scale. Distinctive Image Features from Scale-Invariant Keypoints David G. Lowe presented by, Sudheendra Invariance Intensity Scale Rotation Affine View point Introduction Introduction SIFT (Scale Invariant Feature

More information

A Keypoint Descriptor Inspired by Retinal Computation

A Keypoint Descriptor Inspired by Retinal Computation A Keypoint Descriptor Inspired by Retinal Computation Bongsoo Suh, Sungjoon Choi, Han Lee Stanford University {bssuh,sungjoonchoi,hanlee}@stanford.edu Abstract. The main goal of our project is to implement

More information

Image Processing. Image Features

Image Processing. Image Features Image Processing Image Features Preliminaries 2 What are Image Features? Anything. What they are used for? Some statements about image fragments (patches) recognition Search for similar patches matching

More information

Three-Dimensional Object Detection and Layout Prediction using Clouds of Oriented Gradients

Three-Dimensional Object Detection and Layout Prediction using Clouds of Oriented Gradients ThreeDimensional Object Detection and Layout Prediction using Clouds of Oriented Gradients Authors: Zhile Ren, Erik B. Sudderth Presented by: Shannon Kao, Max Wang October 19, 2016 Introduction Given an

More information

Feature Detection. Raul Queiroz Feitosa. 3/30/2017 Feature Detection 1

Feature Detection. Raul Queiroz Feitosa. 3/30/2017 Feature Detection 1 Feature Detection Raul Queiroz Feitosa 3/30/2017 Feature Detection 1 Objetive This chapter discusses the correspondence problem and presents approaches to solve it. 3/30/2017 Feature Detection 2 Outline

More information

Computer vision: models, learning and inference. Chapter 13 Image preprocessing and feature extraction

Computer vision: models, learning and inference. Chapter 13 Image preprocessing and feature extraction Computer vision: models, learning and inference Chapter 13 Image preprocessing and feature extraction Preprocessing The goal of pre-processing is to try to reduce unwanted variation in image due to lighting,

More information

Beyond Bags of Features

Beyond Bags of Features : for Recognizing Natural Scene Categories Matching and Modeling Seminar Instructed by Prof. Haim J. Wolfson School of Computer Science Tel Aviv University December 9 th, 2015

More information

Scale Invariant Feature Transform

Scale Invariant Feature Transform Why do we care about matching features? Scale Invariant Feature Transform Camera calibration Stereo Tracking/SFM Image moiaicing Object/activity Recognition Objection representation and recognition Automatic

More information

Discovering Visual Hierarchy through Unsupervised Learning Haider Razvi

Discovering Visual Hierarchy through Unsupervised Learning Haider Razvi Discovering Visual Hierarchy through Unsupervised Learning Haider Razvi hrazvi@stanford.edu 1 Introduction: We present a method for discovering visual hierarchy in a set of images. Automatically grouping

More information

CS 558: Computer Vision 4 th Set of Notes

CS 558: Computer Vision 4 th Set of Notes 1 CS 558: Computer Vision 4 th Set of Notes Instructor: Philippos Mordohai Webpage: www.cs.stevens.edu/~mordohai E-mail: Philippos.Mordohai@stevens.edu Office: Lieb 215 Overview Keypoint matching Hessian

More information

CS 231A Computer Vision (Fall 2012) Problem Set 3

CS 231A Computer Vision (Fall 2012) Problem Set 3 CS 231A Computer Vision (Fall 2012) Problem Set 3 Due: Nov. 13 th, 2012 (2:15pm) 1 Probabilistic Recursion for Tracking (20 points) In this problem you will derive a method for tracking a point of interest

More information

CS 223B Computer Vision Problem Set 3

CS 223B Computer Vision Problem Set 3 CS 223B Computer Vision Problem Set 3 Due: Feb. 22 nd, 2011 1 Probabilistic Recursion for Tracking In this problem you will derive a method for tracking a point of interest through a sequence of images.

More information

Motion illusion, rotating snakes

Motion illusion, rotating snakes Motion illusion, rotating snakes Local features: main components 1) Detection: Find a set of distinctive key points. 2) Description: Extract feature descriptor around each interest point as vector. x 1

More information

Object Recognition with Invariant Features

Object Recognition with Invariant Features Object Recognition with Invariant Features Definition: Identify objects or scenes and determine their pose and model parameters Applications Industrial automation and inspection Mobile robots, toys, user

More information

Classifying Images with Visual/Textual Cues. By Steven Kappes and Yan Cao

Classifying Images with Visual/Textual Cues. By Steven Kappes and Yan Cao Classifying Images with Visual/Textual Cues By Steven Kappes and Yan Cao Motivation Image search Building large sets of classified images Robotics Background Object recognition is unsolved Deformable shaped

More information

Image features. Image Features

Image features. Image Features Image features Image features, such as edges and interest points, provide rich information on the image content. They correspond to local regions in the image and are fundamental in many applications in

More information

SIFT: SCALE INVARIANT FEATURE TRANSFORM SURF: SPEEDED UP ROBUST FEATURES BASHAR ALSADIK EOS DEPT. TOPMAP M13 3D GEOINFORMATION FROM IMAGES 2014

SIFT: SCALE INVARIANT FEATURE TRANSFORM SURF: SPEEDED UP ROBUST FEATURES BASHAR ALSADIK EOS DEPT. TOPMAP M13 3D GEOINFORMATION FROM IMAGES 2014 SIFT: SCALE INVARIANT FEATURE TRANSFORM SURF: SPEEDED UP ROBUST FEATURES BASHAR ALSADIK EOS DEPT. TOPMAP M13 3D GEOINFORMATION FROM IMAGES 2014 SIFT SIFT: Scale Invariant Feature Transform; transform image

More information

SIFT - scale-invariant feature transform Konrad Schindler

SIFT - scale-invariant feature transform Konrad Schindler SIFT - scale-invariant feature transform Konrad Schindler Institute of Geodesy and Photogrammetry Invariant interest points Goal match points between images with very different scale, orientation, projective

More information

Deep Supervision with Shape Concepts for Occlusion-Aware 3D Object Parsing

Deep Supervision with Shape Concepts for Occlusion-Aware 3D Object Parsing Deep Supervision with Shape Concepts for Occlusion-Aware 3D Object Parsing Supplementary Material Introduction In this supplementary material, Section 2 details the 3D annotation for CAD models and real

More information

Motion Tracking and Event Understanding in Video Sequences

Motion Tracking and Event Understanding in Video Sequences Motion Tracking and Event Understanding in Video Sequences Isaac Cohen Elaine Kang, Jinman Kang Institute for Robotics and Intelligent Systems University of Southern California Los Angeles, CA Objectives!

More information

Local features: detection and description May 12 th, 2015

Local features: detection and description May 12 th, 2015 Local features: detection and description May 12 th, 2015 Yong Jae Lee UC Davis Announcements PS1 grades up on SmartSite PS1 stats: Mean: 83.26 Standard Dev: 28.51 PS2 deadline extended to Saturday, 11:59

More information

3D Reconstruction of a Hopkins Landmark

3D Reconstruction of a Hopkins Landmark 3D Reconstruction of a Hopkins Landmark Ayushi Sinha (461), Hau Sze (461), Diane Duros (361) Abstract - This paper outlines a method for 3D reconstruction from two images. Our procedure is based on known

More information

Tri-modal Human Body Segmentation

Tri-modal Human Body Segmentation Tri-modal Human Body Segmentation Master of Science Thesis Cristina Palmero Cantariño Advisor: Sergio Escalera Guerrero February 6, 2014 Outline 1 Introduction 2 Tri-modal dataset 3 Proposed baseline 4

More information

Computer Vision for HCI. Topics of This Lecture

Computer Vision for HCI. Topics of This Lecture Computer Vision for HCI Interest Points Topics of This Lecture Local Invariant Features Motivation Requirements, Invariances Keypoint Localization Features from Accelerated Segment Test (FAST) Harris Shi-Tomasi

More information

Midterm Wed. Local features: detection and description. Today. Last time. Local features: main components. Goal: interest operator repeatability

Midterm Wed. Local features: detection and description. Today. Last time. Local features: main components. Goal: interest operator repeatability Midterm Wed. Local features: detection and description Monday March 7 Prof. UT Austin Covers material up until 3/1 Solutions to practice eam handed out today Bring a 8.5 11 sheet of notes if you want Review

More information

Coarse-to-fine image registration

Coarse-to-fine image registration Today we will look at a few important topics in scale space in computer vision, in particular, coarseto-fine approaches, and the SIFT feature descriptor. I will present only the main ideas here to give

More information

Automatic Image Alignment (feature-based)

Automatic Image Alignment (feature-based) Automatic Image Alignment (feature-based) Mike Nese with a lot of slides stolen from Steve Seitz and Rick Szeliski 15-463: Computational Photography Alexei Efros, CMU, Fall 2006 Today s lecture Feature

More information

Object Localization, Segmentation, Classification, and Pose Estimation in 3D Images using Deep Learning

Object Localization, Segmentation, Classification, and Pose Estimation in 3D Images using Deep Learning Allan Zelener Dissertation Proposal December 12 th 2016 Object Localization, Segmentation, Classification, and Pose Estimation in 3D Images using Deep Learning Overview 1. Introduction to 3D Object Identification

More information

MULTI-LEVEL 3D CONVOLUTIONAL NEURAL NETWORK FOR OBJECT RECOGNITION SAMBIT GHADAI XIAN LEE ADITYA BALU SOUMIK SARKAR ADARSH KRISHNAMURTHY

MULTI-LEVEL 3D CONVOLUTIONAL NEURAL NETWORK FOR OBJECT RECOGNITION SAMBIT GHADAI XIAN LEE ADITYA BALU SOUMIK SARKAR ADARSH KRISHNAMURTHY MULTI-LEVEL 3D CONVOLUTIONAL NEURAL NETWORK FOR OBJECT RECOGNITION SAMBIT GHADAI XIAN LEE ADITYA BALU SOUMIK SARKAR ADARSH KRISHNAMURTHY Outline Object Recognition Multi-Level Volumetric Representations

More information

3D Perception. CS 4495 Computer Vision K. Hawkins. CS 4495 Computer Vision. 3D Perception. Kelsey Hawkins Robotics

3D Perception. CS 4495 Computer Vision K. Hawkins. CS 4495 Computer Vision. 3D Perception. Kelsey Hawkins Robotics CS 4495 Computer Vision Kelsey Hawkins Robotics Motivation What do animals, people, and robots want to do with vision? Detect and recognize objects/landmarks Find location of objects with respect to themselves

More information

Image Features: Local Descriptors. Sanja Fidler CSC420: Intro to Image Understanding 1/ 58

Image Features: Local Descriptors. Sanja Fidler CSC420: Intro to Image Understanding 1/ 58 Image Features: Local Descriptors Sanja Fidler CSC420: Intro to Image Understanding 1/ 58 [Source: K. Grauman] Sanja Fidler CSC420: Intro to Image Understanding 2/ 58 Local Features Detection: Identify

More information

Robust PDF Table Locator

Robust PDF Table Locator Robust PDF Table Locator December 17, 2016 1 Introduction Data scientists rely on an abundance of tabular data stored in easy-to-machine-read formats like.csv files. Unfortunately, most government records

More information

Augmented Reality VU. Computer Vision 3D Registration (2) Prof. Vincent Lepetit

Augmented Reality VU. Computer Vision 3D Registration (2) Prof. Vincent Lepetit Augmented Reality VU Computer Vision 3D Registration (2) Prof. Vincent Lepetit Feature Point-Based 3D Tracking Feature Points for 3D Tracking Much less ambiguous than edges; Point-to-point reprojection

More information

Outline 7/2/201011/6/

Outline 7/2/201011/6/ Outline Pattern recognition in computer vision Background on the development of SIFT SIFT algorithm and some of its variations Computational considerations (SURF) Potential improvement Summary 01 2 Pattern

More information

HISTOGRAMS OF ORIENTATIO N GRADIENTS

HISTOGRAMS OF ORIENTATIO N GRADIENTS HISTOGRAMS OF ORIENTATIO N GRADIENTS Histograms of Orientation Gradients Objective: object recognition Basic idea Local shape information often well described by the distribution of intensity gradients

More information

CS 4495 Computer Vision A. Bobick. CS 4495 Computer Vision. Features 2 SIFT descriptor. Aaron Bobick School of Interactive Computing

CS 4495 Computer Vision A. Bobick. CS 4495 Computer Vision. Features 2 SIFT descriptor. Aaron Bobick School of Interactive Computing CS 4495 Computer Vision Features 2 SIFT descriptor Aaron Bobick School of Interactive Computing Administrivia PS 3: Out due Oct 6 th. Features recap: Goal is to find corresponding locations in two images.

More information

Feature Detectors and Descriptors: Corners, Lines, etc.

Feature Detectors and Descriptors: Corners, Lines, etc. Feature Detectors and Descriptors: Corners, Lines, etc. Edges vs. Corners Edges = maxima in intensity gradient Edges vs. Corners Corners = lots of variation in direction of gradient in a small neighborhood

More information

Learning from 3D Data

Learning from 3D Data Learning from 3D Data Thomas Funkhouser Princeton University* * On sabbatical at Stanford and Google Disclaimer: I am talking about the work of these people Shuran Song Andy Zeng Fisher Yu Yinda Zhang

More information

Specular 3D Object Tracking by View Generative Learning

Specular 3D Object Tracking by View Generative Learning Specular 3D Object Tracking by View Generative Learning Yukiko Shinozuka, Francois de Sorbier and Hideo Saito Keio University 3-14-1 Hiyoshi, Kohoku-ku 223-8522 Yokohama, Japan shinozuka@hvrl.ics.keio.ac.jp

More information

COSC160: Detection and Classification. Jeremy Bolton, PhD Assistant Teaching Professor

COSC160: Detection and Classification. Jeremy Bolton, PhD Assistant Teaching Professor COSC160: Detection and Classification Jeremy Bolton, PhD Assistant Teaching Professor Outline I. Problem I. Strategies II. Features for training III. Using spatial information? IV. Reducing dimensionality

More information

SurfNet: Generating 3D shape surfaces using deep residual networks-supplementary Material

SurfNet: Generating 3D shape surfaces using deep residual networks-supplementary Material SurfNet: Generating 3D shape surfaces using deep residual networks-supplementary Material Ayan Sinha MIT Asim Unmesh IIT Kanpur Qixing Huang UT Austin Karthik Ramani Purdue sinhayan@mit.edu a.unmesh@gmail.com

More information

Indoor Object Recognition of 3D Kinect Dataset with RNNs

Indoor Object Recognition of 3D Kinect Dataset with RNNs Indoor Object Recognition of 3D Kinect Dataset with RNNs Thiraphat Charoensripongsa, Yue Chen, Brian Cheng 1. Introduction Recent work at Stanford in the area of scene understanding has involved using

More information

Technical Report No D Object Recognition using Local Shape Descriptors. Mustafa Mohamad

Technical Report No D Object Recognition using Local Shape Descriptors. Mustafa Mohamad Technical Report No. 2013-614 3D Object Recognition using Local Shape Descriptors Mustafa Mohamad School of Computing Queen s University Kingston, Ontario, Canada mustafa@cs.queensu.ca November 7, 2013

More information

Generating Object Candidates from RGB-D Images and Point Clouds

Generating Object Candidates from RGB-D Images and Point Clouds Generating Object Candidates from RGB-D Images and Point Clouds Helge Wrede 11.05.2017 1 / 36 Outline Introduction Methods Overview The Data RGB-D Images Point Clouds Microsoft Kinect Generating Object

More information

Deep Supervision with Shape Concepts for Occlusion-Aware 3D Object Parsing Supplementary Material

Deep Supervision with Shape Concepts for Occlusion-Aware 3D Object Parsing Supplementary Material Deep Supervision with Shape Concepts for Occlusion-Aware 3D Object Parsing Supplementary Material Chi Li, M. Zeeshan Zia 2, Quoc-Huy Tran 2, Xiang Yu 2, Gregory D. Hager, and Manmohan Chandraker 2 Johns

More information

CS 532: 3D Computer Vision 13 th Set of Notes

CS 532: 3D Computer Vision 13 th Set of Notes 1 CS 532: 3D Computer Vision 13 th Set of Notes Instructor: Philippos Mordohai Webpage: www.cs.stevens.edu/~mordohai E-mail: Philippos.Mordohai@stevens.edu Office: Lieb 215 Lecture Outline Unorganized

More information

3D Object Recognition using Multiclass SVM-KNN

3D Object Recognition using Multiclass SVM-KNN 3D Object Recognition using Multiclass SVM-KNN R. Muralidharan, C. Chandradekar April 29, 2014 Presented by: Tasadduk Chowdhury Problem We address the problem of recognizing 3D objects based on various

More information

Feature Based Registration - Image Alignment

Feature Based Registration - Image Alignment Feature Based Registration - Image Alignment Image Registration Image registration is the process of estimating an optimal transformation between two or more images. Many slides from Alexei Efros http://graphics.cs.cmu.edu/courses/15-463/2007_fall/463.html

More information

Using Geometric Blur for Point Correspondence

Using Geometric Blur for Point Correspondence 1 Using Geometric Blur for Point Correspondence Nisarg Vyas Electrical and Computer Engineering Department, Carnegie Mellon University, Pittsburgh, PA Abstract In computer vision applications, point correspondence

More information

Ensemble of Bayesian Filters for Loop Closure Detection

Ensemble of Bayesian Filters for Loop Closure Detection Ensemble of Bayesian Filters for Loop Closure Detection Mohammad Omar Salameh, Azizi Abdullah, Shahnorbanun Sahran Pattern Recognition Research Group Center for Artificial Intelligence Faculty of Information

More information

The SIFT (Scale Invariant Feature

The SIFT (Scale Invariant Feature The SIFT (Scale Invariant Feature Transform) Detector and Descriptor developed by David Lowe University of British Columbia Initial paper ICCV 1999 Newer journal paper IJCV 2004 Review: Matt Brown s Canonical

More information

Scenario/Motivation. Introduction System Overview Surface Estimation Geometric Category and Model Applications Conclusions

Scenario/Motivation. Introduction System Overview Surface Estimation Geometric Category and Model Applications Conclusions Single View Categorization and Modelling Nico Blodow, Zoltan-Csaba Marton, Dejan Pangercic, Michael Beetz Intelligent Autonomous Systems Group Technische Universität München IROS 2010 Workshop on Defining

More information

Object and Class Recognition I:

Object and Class Recognition I: Object and Class Recognition I: Object Recognition Lectures 10 Sources ICCV 2005 short courses Li Fei-Fei (UIUC), Rob Fergus (Oxford-MIT), Antonio Torralba (MIT) http://people.csail.mit.edu/torralba/iccv2005

More information

Facial Expression Classification with Random Filters Feature Extraction

Facial Expression Classification with Random Filters Feature Extraction Facial Expression Classification with Random Filters Feature Extraction Mengye Ren Facial Monkey mren@cs.toronto.edu Zhi Hao Luo It s Me lzh@cs.toronto.edu I. ABSTRACT In our work, we attempted to tackle

More information

Wikipedia - Mysid

Wikipedia - Mysid Wikipedia - Mysid Erik Brynjolfsson, MIT Filtering Edges Corners Feature points Also called interest points, key points, etc. Often described as local features. Szeliski 4.1 Slides from Rick Szeliski,

More information

A NEW FEATURE BASED IMAGE REGISTRATION ALGORITHM INTRODUCTION

A NEW FEATURE BASED IMAGE REGISTRATION ALGORITHM INTRODUCTION A NEW FEATURE BASED IMAGE REGISTRATION ALGORITHM Karthik Krish Stuart Heinrich Wesley E. Snyder Halil Cakir Siamak Khorram North Carolina State University Raleigh, 27695 kkrish@ncsu.edu sbheinri@ncsu.edu

More information

Det De e t cting abnormal event n s Jaechul Kim

Det De e t cting abnormal event n s Jaechul Kim Detecting abnormal events Jaechul Kim Purpose Introduce general methodologies used in abnormality detection Deal with technical details of selected papers Abnormal events Easy to verify, but hard to describe

More information

Lecture 8 Object Descriptors

Lecture 8 Object Descriptors Lecture 8 Object Descriptors Azadeh Fakhrzadeh Centre for Image Analysis Swedish University of Agricultural Sciences Uppsala University 2 Reading instructions Chapter 11.1 11.4 in G-W Azadeh Fakhrzadeh

More information

IMAGE RETRIEVAL USING VLAD WITH MULTIPLE FEATURES

IMAGE RETRIEVAL USING VLAD WITH MULTIPLE FEATURES IMAGE RETRIEVAL USING VLAD WITH MULTIPLE FEATURES Pin-Syuan Huang, Jing-Yi Tsai, Yu-Fang Wang, and Chun-Yi Tsai Department of Computer Science and Information Engineering, National Taitung University,

More information

Prof. Feng Liu. Spring /26/2017

Prof. Feng Liu. Spring /26/2017 Prof. Feng Liu Spring 2017 http://www.cs.pdx.edu/~fliu/courses/cs510/ 04/26/2017 Last Time Re-lighting HDR 2 Today Panorama Overview Feature detection Mid-term project presentation Not real mid-term 6

More information

Chapter 4. Clustering Core Atoms by Location

Chapter 4. Clustering Core Atoms by Location Chapter 4. Clustering Core Atoms by Location In this chapter, a process for sampling core atoms in space is developed, so that the analytic techniques in section 3C can be applied to local collections

More information

Image matching. Announcements. Harder case. Even harder case. Project 1 Out today Help session at the end of class. by Diva Sian.

Image matching. Announcements. Harder case. Even harder case. Project 1 Out today Help session at the end of class. by Diva Sian. Announcements Project 1 Out today Help session at the end of class Image matching by Diva Sian by swashford Harder case Even harder case How the Afghan Girl was Identified by Her Iris Patterns Read the

More information

Image Features. Work on project 1. All is Vanity, by C. Allan Gilbert,

Image Features. Work on project 1. All is Vanity, by C. Allan Gilbert, Image Features Work on project 1 All is Vanity, by C. Allan Gilbert, 1873-1929 Feature extrac*on: Corners and blobs c Mo*va*on: Automa*c panoramas Credit: Ma9 Brown Why extract features? Mo*va*on: panorama

More information

Object Detection by 3D Aspectlets and Occlusion Reasoning

Object Detection by 3D Aspectlets and Occlusion Reasoning Object Detection by 3D Aspectlets and Occlusion Reasoning Yu Xiang University of Michigan Silvio Savarese Stanford University In the 4th International IEEE Workshop on 3D Representation and Recognition

More information

Clustering and Visualisation of Data

Clustering and Visualisation of Data Clustering and Visualisation of Data Hiroshi Shimodaira January-March 28 Cluster analysis aims to partition a data set into meaningful or useful groups, based on distances between data points. In some

More information

Edge and corner detection

Edge and corner detection Edge and corner detection Prof. Stricker Doz. G. Bleser Computer Vision: Object and People Tracking Goals Where is the information in an image? How is an object characterized? How can I find measurements

More information

Applying Synthetic Images to Learning Grasping Orientation from Single Monocular Images

Applying Synthetic Images to Learning Grasping Orientation from Single Monocular Images Applying Synthetic Images to Learning Grasping Orientation from Single Monocular Images 1 Introduction - Steve Chuang and Eric Shan - Determining object orientation in images is a well-established topic

More information

Learning 3D Part Detection from Sparsely Labeled Data: Supplemental Material

Learning 3D Part Detection from Sparsely Labeled Data: Supplemental Material Learning 3D Part Detection from Sparsely Labeled Data: Supplemental Material Ameesh Makadia Google New York, NY 10011 makadia@google.com Mehmet Ersin Yumer Carnegie Mellon University Pittsburgh, PA 15213

More information

String distance for automatic image classification

String distance for automatic image classification String distance for automatic image classification Nguyen Hong Thinh*, Le Vu Ha*, Barat Cecile** and Ducottet Christophe** *University of Engineering and Technology, Vietnam National University of HaNoi,

More information

Motion Estimation and Optical Flow Tracking

Motion Estimation and Optical Flow Tracking Image Matching Image Retrieval Object Recognition Motion Estimation and Optical Flow Tracking Example: Mosiacing (Panorama) M. Brown and D. G. Lowe. Recognising Panoramas. ICCV 2003 Example 3D Reconstruction

More information

Removing Shadows from Images

Removing Shadows from Images Removing Shadows from Images Zeinab Sadeghipour Kermani School of Computing Science Simon Fraser University Burnaby, BC, V5A 1S6 Mark S. Drew School of Computing Science Simon Fraser University Burnaby,

More information

3D model classification using convolutional neural network

3D model classification using convolutional neural network 3D model classification using convolutional neural network JunYoung Gwak Stanford jgwak@cs.stanford.edu Abstract Our goal is to classify 3D models directly using convolutional neural network. Most of existing

More information

Image processing and features

Image processing and features Image processing and features Gabriele Bleser gabriele.bleser@dfki.de Thanks to Harald Wuest, Folker Wientapper and Marc Pollefeys Introduction Previous lectures: geometry Pose estimation Epipolar geometry

More information

CS233: The Shape of Data Handout # 3 Geometric and Topological Data Analysis Stanford University Wednesday, 9 May 2018

CS233: The Shape of Data Handout # 3 Geometric and Topological Data Analysis Stanford University Wednesday, 9 May 2018 CS233: The Shape of Data Handout # 3 Geometric and Topological Data Analysis Stanford University Wednesday, 9 May 2018 Homework #3 v4: Shape correspondences, shape matching, multi-way alignments. [100

More information

SCALE INVARIANT FEATURE TRANSFORM (SIFT)

SCALE INVARIANT FEATURE TRANSFORM (SIFT) 1 SCALE INVARIANT FEATURE TRANSFORM (SIFT) OUTLINE SIFT Background SIFT Extraction Application in Content Based Image Search Conclusion 2 SIFT BACKGROUND Scale-invariant feature transform SIFT: to detect

More information

CS229: Action Recognition in Tennis

CS229: Action Recognition in Tennis CS229: Action Recognition in Tennis Aman Sikka Stanford University Stanford, CA 94305 Rajbir Kataria Stanford University Stanford, CA 94305 asikka@stanford.edu rkataria@stanford.edu 1. Motivation As active

More information

Object Category Detection. Slides mostly from Derek Hoiem

Object Category Detection. Slides mostly from Derek Hoiem Object Category Detection Slides mostly from Derek Hoiem Today s class: Object Category Detection Overview of object category detection Statistical template matching with sliding window Part-based Models

More information

3D Keypoints Detection for Objects Recognition

3D Keypoints Detection for Objects Recognition 3D Keypoints Detection for Objects Recognition Ayet Shaiek 1, and Fabien Moutarde 1 1 Robotics laboratory (CAOR) Mines ParisTech 60 Bd St Michel, F-75006 Paris, France Abstract - In this paper, we propose

More information