Scalable Object Classification in Range Images

Size: px
Start display at page:

Download "Scalable Object Classification in Range Images"

Transcription

1 Scalable Object Classification in Range Images Eunyoung Kim and Gerard Medioni Institute for Robotics and Intelligent Systems USC Viterbi School of Engineering University of Southern California Los Angeles, CA, USA Abstract We present a novel scalable framework for freeform object classification in range images. The framework includes an automatic 3D object recognition system in range images and a scalable database structure to learn new instances and new categories efficiently. We adopt the TAX model, previously proposed for unsupervised object modeling in 2D images, to construct our hierarchical model of object classes from unlabelled range images. The hierarchical model embodies unorganized shape patterns of 3D objects in various classes in a tree structure with probabilistic distributions. A new visual vocabulary is introduced to represent a range image as a set of visual words for the process of hierarchical model inference, classification and online learning. We also propose an online learning algorithm that updates the hierarchical model efficiently thanks to the tree structure, when a new object should be learned into the model. Extensive experiments demonstrate average classification rates of 94% on a large synthetic dataset (1,350 training images and 450 test images for 9 object classes) and 88.4% on 1,433 depth images captured from real-time range sensors. We also show that our approach outperforms the original TAX method in terms of recall rate and stability. Keywords-object classification; range images; scalable data structure I. INTRODUCTION Shape-based object classification in range images aims to label objects captured in range images based on the common patterns shared by the other objects in the same class. Its applications include autonomous robotic navigation and manipulation, and urban scene understanding. Object classification using complete 3D models has been actively studied for content-based shape retrieval [1]. However, categorizing objects in range images faces another challenge, as range images provide irregularly sampled 3D points representing only the visible surfaces of objects unlike 3D models. The range images of the same object from different views may be very varied. Therefore, a classification method using range images should be tolerant to intra-class variance in shape (e.g. partial views), but also be strict on distinctive inter-class shape differences. Also, the database should have an expandable structure to learn novel shape patterns of object classes not seen during training. Thus, we present a scalable object Figure 1. System overview: the red box is the focus of this paper classification framework that aims at efficiently categorizing 3D objects captured in range images and updating the database of object classes, when previously unseen data is detected in the scene. Fig. 1 outlines the proposed framework. Our system first segments object candidates (i.e. point clouds) from the scene, and then identifies the class label for each candidate based on observation that 1) our target applications are robotic manipulation and urban scene understanding in LIDAR images where objects rest on planar surfaces (e.g. ground) in noncluttered environments in many cases, and 2) segmentation is inevitable to learn new instances in unsupervised manner. Also, object segmentation improves run-time performance. For example, Spin Image [2] and Tensor matching [3] methods took 2 hrs and 90 sec per object [4], while our method took less than 2 sec per object, including the object segmentation process. The focus of our scalable classification framework includes 1) hierarchical structured database, 2) object classifier and 3) online learning process. 1) Hierarchical structured database (Sect. IV): For fast label inference and online learning, we construct a hierarchical structured database. The TAX model [5] is adopted to build the hierarchical model of object classes

2 from unlabelled range images by mapping each image to a path in the tree, composed of L nodes. Also, we propose a new visual vocabulary as an object representation (Sect. III), which utilizes the spatial context between a supporting planar surface and an object. 2) Object classifier (Sect. V): After segmentation, when the visible surface of a 3D object is given, our classifier identifes its class label. We enhance the original approach on the labeling process in [5]. Our method enables to compute normalized shape similarity between objects and impose local patterns between visual words, which are disregarded during training. 3) Online learning (Sect. VI): We introduce an online learning approach using the TAX model with discussion on new instance/class learning. The outline of the paper is as follows: Section II summarizes related work with our contribution. Section III introduces a new visual vocabulary, and we briefly review the TAX model in Section IV. The details of the proposed classification and online learning processes are described in Section V and Section VI, respectively. Experimental results are shown in Section VII followed by concluding remarks. II. RELATED WORK There are two main approaches to shape-based object classification: global description and local description. Global description based method has been popularly used for content-based shape retrieval [1], [6], [7], but it is not suitable for object classification in range images, which capture noisy partial surfaces of objects. So, the most popular approach is to extract 3D local shape descriptors from the point cloud, and recognize objects in the image by matching them with known objects. Splash [8] captures the distribution of orientations around the reference point, and Spin image [2] represents, given a reference point p, a 2D histogram of (α, β) coordinates of its neighbors, where (α, β) coordinate spans around the orientation of the point p. Besides, spherical spin image [9], normal-based signatures [10], tensor-based representation [3], point pair features [4] are also proposed. However, these works mainly aim to recognize and localize the objects whose exact shape is already defined in the database. For object classification, part-based approach has been mainly used. Huber et al. [11] identifies the class of a given object by inferring parts from similar local shape descriptors (spin image), grouping the parts from all objects into part classes, and mapping part classes to object classes. The method presented in [12] also groups similar local shape descriptors to form a set of shape-class components and uses surface signatures that encode the spatial relationships between their components. [13] combines spin images with other contextual features to classify objects. It is worth noting that spin image mostly exploited in these methods is very sensitive to resolution of 3D points and very slow. Our contribution over these methods includes that : We underline the spatial context between objects and a supporting planar surface, which gives more stable orientation and global description of objects (e.g. height). Note that noisy and unreliable depth measurements from range sensors result in inaccurate surface orientation estimation and may weaken the performance of a local shape descriptor. These works have rarely discussed how to deal with a large number of range images for efficient label inference, when a database should be gradually expanded. Also, they need batch learning every time new instances are added. We address the downside of the TAX model and suggest a method to improve the performance. III. OBJECT REPRESENTATION IN VISUAL VOCABULARY A basic descriptor of the TAX model is a visual word. Thus, every range image should be represented by a set of visual words. We design a new visual vocabulary that utilizes the spatial context with the ground surface as global description, based on observation that objects rest on planar surfaces in many cases. For example, objects in urban area are located on the ground surface, roofs and walls of buildings. Objects in indoor scenes also rest on planar surfaces to be stable. Supporting planar surfaces are used to save computations by bounding 3D points representing interesting objects and define the major orientation of objects in the scene. Many works have been done to extract planar surfaces from range images [14], [15]. In our tests, we fit planar patches using RANSAC and extract planar surfaces by grouping the consistent planar patches. For each planar surface, we segment point clusters on the surface using the surface pose and adjacency of points. Each cluster is considered as an object candidate. To compute visual words, every candidate is transformed into the ground surface coordinate system. It makes our visual words invariant to rotation around the Y-axis (which corresponds to height from the surface), as each object has free rotation about the Y-axis in the ground coordinate system. Note that object segmentation in very cluttered environment is beyond our scope. Interest point sampling Given each segmented point cloud, we uniformly sample interest points and encode them into visual words, since the point cloud usually has thousands of unorganized 3D points. The interest points are sampled corresponding to surface saliency. This prevents the use of points close to the boundaries of objects and noisy 3D points. Surface saliency is determined by Tensor voting (TV) [16]. Given the points with noise and inaccurate depth, the TV process serves to infer surface saliency and more reliable surface orientation for every input point. After the TV process, the point with the highest saliency is selected as an interest point, and its all

3 (a) Global descriptor (b) Six regions (c) Local descriptor Figure 2. Visual word assignment Figure 3. Graphical model of the TAX neighbors within a certain radius are discarded from the list. This sampling process is iteratively applied to the remaining points until no points are left. The sampling radius depends on the resolution of 3D points. In our experiments, it was set to 1 cm. Descriptor computation The next step is to encode every interest point into a visual word. Each visual word w is a 8-dimension integer vector (w R 8 ). The first two coordinates correspond to global information of a point (Fig. 2(a)) and the remaining 6 coordinates contain local description of a point (Fig 2(c)). Given a 3D point p i = (x i, y i, z i ) with its orientation n i = (n i x, n i y, n i z) (in the ground coordinate system), n i y is used to determine the value of the first coordinate, which indicates the surface type the point p i is on. i.e. 0, if n i y cos( π 6 ) or 1, if cos( π 3 ) ni y < cos( π 6 ) or 2, if ni y cos( π 3 ) The second coordinate encodes the height of the point from the ground, y i. The object is divided into γ regions in terms of height. yiγ max h (see Fig. 2(a)) is assigned to the second coordinate. In our experiments, γ was set to 10. An additional value is used for noisy points much higher than the objects we expect. In total, the second bin can have 11 different values. The remaining six coordinates capture surface smoothness at the point. We align the cylinder (Fig. 2(b)) to the point in order to partition its neighbors into six different regions as illustrated in Fig. 2(c). Then, for every region r, we compute the average orientation similarity, θ r = 1 N r j N r (n i n j ), where N r is a set of neighboring points in the region. Finally, the corresponding coordinate has the degree of surface smoothness defined as: 0, smooth (cos( π 6 ) θ r) 1, weakly smooth (cos( π 3 ) θ r < cos( π 6 )) 2, not smooth ( θ r < cos( π 3 )) 3, if there are no points For example, for the point p i shown in Fig. 2(c), which has no neighbors in the regions A,C,D and F, but smooth surface with the region B and E, its visual word has 3s and 0s for the corresponding coordinates, respectively. As a result, our visual vocabulary has 135, 168 (= ) visual words. Every interest point d in image i has a corresponding visual word w i,d. Our visual word has a coarse description to handle shape variance induced by partial views and inaccurate orientation. Discriminative descriptions between object classes come from the co-occurrence patterns of visual words. IV. HIERARCHICAL MODEL OF OBJECT CLASS To build a hierarchical model of object classes, we adopt the generative model TAX previously proposed to learn a taxonomy from a collection of unlabelled images [5]. The intuition behind this hierarchical model is to group range images with similar shape patterns into the same path, and each pattern is characterized by probabilistic models associated with the frequency of co-occurrence of visual words. We select the TAX model as our database structure, since the TAX model enables efficient label inference and online learning thanks to its tree structure. Also, the generative model shows better performance for a small number of training sets [17]. Our application requires to learn a few instances of a new class incrementally. This section provides a brief review on the TAX model. The TAX model is a hierarchical variant of the Latent Dirichlet Allocation (LDA) model [18]. The LDA was introduced for unsupervised discovery of topics for document classification and popularly used for image retrieval as well. While the LDA has a flat structure, the TAX maps each document to one of the path in the tree.the height of the tree L is should be given. Fig. 3 depicts a graphical model of the TAX, and the complete probability model in generative process is: T ree NCRP(γ), l i,d Mult(1/L) π c Dir(α), φ t Dir(β) z i,d Mult(π cψi,l i,d ), w i,d Mult(φ zi,d ) Intuitively, in this generative model, each range image i is assigned to path ψ i in the tree. The path is chosen by nested Chinese restaurant Process prior (i.e. ncpr(γ)), where γ is a parameter controlling the branching probability [19]. It can be either one of the existing paths in the tree, or a new path split from the existing internal node of the tree. Every node c has a multinomial distribution π c over topics, and each topic t is also modeled by a multinomial distribution φ t over visual words. Both π c and φ t are derived from Dirichlet distributions with hyperparameters α and β, respectively, which affect the relative sparsity of these distributions. Given every interest point d in image i, ψ i is the path that the image is assigned to, and z i,d and l i,d are the topic

4 and the level assignments of the interest point, respectively. These are the hidden variables that should be inferred from every observed interest point (i.e. w i,d ) during training. As it is intractable to compute the posterior distribution of these latent variables given the observations, an approximation method, Gibbs sampling is used. Gibbs sampling approximates the posterior distribuition by iteratively drawing samples of z i,d, l i,d and ψ i from the conditional distributions given the other hidden variables and the observations, for instance, p(z i,d = z L, Ψ, W, α, β, γ), where L and Ψ are previous assignments of level and path and W is the given observed visual words. Similarly, p(l i,d = l rest) and p(ψ i = ψ rest) are defined. More details are available in [5]. After Gibbs sampling, we finally infer the hierarchical structured database (HSD). Figure 4. Overview of our classification process V. OBJECT CLASSIFICATION After HSD inference, every training image belongs to one of the complete paths in the tree, and the multinomial distribution ˆφ t for every topic t and the distribution ˆπ c at node c are approximated by the topic and level assignments: ˆφ t,w = β + N t,w, ˆπ c,t = α + N c,t, βw + N t,. αt + N c,. where N t,w is the number of interest points whose visual word is w associated with topic t, N t,. is the number of interest points assigned to topic t, N c,t is the number of interest points assigned to topic t and node c and N c,. is the total number of interest points assigned to node c. So, when a new image j is given, the probability of observing image j given the path ψ is defined as: p(j ψ) = ˆφ t,wj,d ˆπ ψ,l,t d l,t where w j,d is a visual word of interest point d in image j and (ψ, l) is the l-level node of the path ψ. Fig. 4 depicts our approach to object classification using the probability p(j ψ). We first describe a naïve approach motivated from the original TAX model [5]. To deal with the drawbacks of the naïve approach, we then propose a novel visual vocabulary, Pattern from Neighbors(PfN), to enforce discriminative patterns in local visual words and compute normalized shape similarity between objects. The naïve approach is simple. It computes p(j ψ) for every existing path ψ to identify σ-number of the paths with the highest similarity. Let Ψ be the σ-number of the paths. Fig. 4 shows an example of Ψ (red lines, σ = 2). Then, the test image is labeled by majority voting of the objects under the paths in Ψ. In our experiments, σ was always set to 3. Unfortunately, this approach often exhibits very poor and unstable performance. e.g., all cups are labeled as bottle. If some object classes share similar shapes (e.g. red and blue regions in Fig. 5) and the distributions at nodes largely inferred from these shapes, the images from different classes can be led to the same path. (a) Cup (b) Bottle Figure 5. PfN visual word inference from original words: The cylinder (red) in Fig. 2(b) is aligned to every interest point. A. Proposed approach using PfN vocabulary Our idea to handle this issue is to reduce the ambiguity in the path by enforcing shape similarity between the objects under the paths in Ψ and the test object j. Given the set of images I under the path ψ (ψ Ψ), we compute a normalized shape similarity sim(i, j) (i I ) and use it as a weight for the majority voting process, whereas the naïve approach gives an equal weight to all the images in I. Motivated by observation that the ambiguity arises when the discriminative patterns of the objects (e.g. green in Fig. 5) are not properly captured in the HSD, we characterize an object by a distribution representing the frequency over all visual words in the object, and define sim(i, j) as the distance of two distributions from object i and j. However, the distribution over the original visual vocabulary could be too sparse due to its large size (135,168 words). We thus design a PfN visual vocabulary that captures a pattern of local visual words. A PfN visual word represents the distance between two visual words w.r.t. surface smoothness pattern. It is a 7- dimension integer vector. Given two visual words w t and w n, the PfN visual word f tn is defined as: f tn [0] = tid, f tn [1] = h(w t [3], w n [3]),..., f tn [6] = h(w t [8], w n [8]), where h(x, y) is the Hamming distance between x and y, and w[x] is the value of x-th coordinate of the visual word w. The first coordinate tid is a cluster label that serves to link the PfN word to the original visual word w t. Since the original vocabulary is too large, we compute the clusters of the visual words under the same path and use

5 the cluster label as tid for each visual word. In total, the PfN vocabulary has 576 (9 2 6 ) visual words (9 clusters). For every w i,d, we compute all possible PfN visual words with its neighbors within the certain distance τ d. For instance, the interest points in blue region in Fig. 5 infer [t ]s from the neighbors in the same region and [t ]s from the neighbors in the red region. Then, we compute distribution F i over the PfN visual words, which represents the frequency of the words in object i: F i,fp = N i,f P N i,, where N i,fp is the number of the PfN visual word f P and N i, is the total number of PfN visual words inferred from object i. Finally, sim(i, j) is defined as the Bhattacharyya coefficient of F i and F j. (0 sim(i, j) 1) B. Classification process Given a new image j, we first identify a set of paths Ψ having most similar shape patterns with image j. For every path ψ (ψ Ψ), Step 1 : Let I be the objects under the path ψ and V I be all the visual words that the objects have. To assign a cluster id (i.e. tid), we initially group the visual words in V I into three clusters w.r.t. orientation (i.e. w[1]), and then apply k-mean clustering to the visual words in each initial cluster to group the visual words w.r.t. the surface smoothness (i.e. w[3] w[8]) In our experiments, k is set to 3. Every visual word in V I has the corresponding cluster label. Step 2 : Using the estimated cluster properties (center, dimension), assign the cluster label to every visual word in V j as well. Unless at least half of the visual words in V j are coupled with the clusters from V I, every object in I is discarded from the voting process. Step 3 : Compute sim(i, j) with every image i in I. Step 4 : Finally, the test object j is labeled as the object class which receives the majority of the weighted votes ( e.g. red box in Fig. 4). Extensive experiments (Table I in Sect. VII) demonstrate that our approach using the PfN vocabulary provides more stable and better labeling performance. Range images may have objects not present in the database. We threshold the shape similarity sim(i, j) to identify these new objects. During the classification process, if sim(i, j) is lower than a tolerance τ n, the test image j has no support from image i. If the test image is not supported by any of the training images under paths in Ψ, it is labeled as a new class in our framework. VI. SCALABLE APPROACH: ONLINE LEARNING Sect. IV describes the batch learning process to infer a HSD from a given existing dataset. However, in many applications, the HSD should grow over time and it is infeasible to run batch algorithms repeatedly everytime a new range image is added. For this reason, we propose here an online learning algorithm that incrementally updates the existing HSD given a new range image. As recent related work, [20] discussed several approaches to online inference for the LDA, which has a flat structure. Algorithm 1 outlines our online learning algorithm. Given the initial HSD already trained by the batch algorithm, the database efficiently learns a new object i by local update thanks to its tree structure. M i is the number of interest points in the new image i. Unlike our batch learning process that the topic, level and path assignments are iteratively sampled in turn, for local update, our online learning process starts from choosing a path ψ i where the image i should belong to, and assigns topic and level variables for every interest point in the image i by Gibbs sampling as batch learning does (line 2-5 in Algorithm 1). The path ψ i is either one of the existing paths or a new path. The path assignment process is discussed later. Then, we should also resample the previous topic and level assignments in the existing images R(i) (line 6-10 in Algorithm 1) associated with the path ψ i in order to update the distributions related to the path by conditioning on not only the existing training data, but also the new objecet. We explored several approaches to selection of R(i), which is the set of old images to be updated related to the path ψ i. Based on comparative experiments, to construct R(i), we randomly select the images from each path that shares the deepest branching-off internal node of the path ψ i. In our experiments, if the test object is a new object, the database is incrementally updated on the fly, then the updated database is used for the following classification process. For efficiency, our online learning process expands the existing hierarchical database only if there is a new instance of an existing class or a new class. A new class is obviously detected by the classification process. To identify a new instance of an existing class, we utilize the path assignment process as follows. Path assignment for new object class : The new object class should be assigned to a new path in the tree, and p(ψ i = ψ rest) (Sect.IV) serves to compute similarity that the object i belongs to new path ψ in the tree. We iteratively compute p(ψ i = ψ rest) and sample a path from it. But, computing similarity for every possible new path in the tree is inefficient, since p(ψ i = ψ rest) requires heavy gamma function computations. We thus specify the path candidate Ψ c. For new object class, Ψ c is defined as a set of new paths Ψ new (green lines in Fig. 6) sharing nodes with the paths in Ψ (red lines), i.e. Ψ c = Ψ new. Recall that Ψ is inferred from the classification process. After iterative path sampling on Ψ c, the image i is learned under the sampled path in the HSD with a new label through the process described in Algorithm 1. Online learning for new instance of existing class :

6 Algorithm 1 Pseudocode of online learning process 1: Determine path ψ i 2: for d = 1,, M i do 3: Sample z i,d using p(z i,d = z rest) 4: Sample l i,d using p(l i,d = l rest) 5: end for 6: for j in R(i) do 7: for d = 1,..., M j do 8: Sample z j,d using p(z i,d = z rest) 9: Sample l j,d using p(l i,d = l rest) 10: end for 11: end for Figure 6. Path candidates: black line: existing paths, red line: selected existing paths with high similarity with the given new image, green line: new path candidates Even though the test object i is labeled as an existing class, the existing HSD might not capture the object i properly (considered as a new instance of the class in our framework), and it should be learned in the HSD for the following classification process. To verify whether the object is fully modeled in the existing HSD, the path assignment process is exploited. In this case, we construct the path candidate Ψ c to contain both the paths in Ψ (red lines in Fig. 6) and the new paths sharing nodes with paths in Ψ (green lines), i.e. Ψ c = Ψ new Ψ. After iterative path sampling, unless the final path sampled is one of the existing paths in the HSD, the object i is considered as a new instance and inserted into the HSD under that path by our online learning algorithm, as there are no paths characterizing the object i well in the HSD. VII. EXPERIMENTAL RESULTS To aqcuire a large number of range images for extensive evaluation, we used the number of 3D models freely available on the web, as it is hard to build a dataset containing range images of various objects only using real range sensors. We randomly selected various-shaped 3D models for 9 different object classes (bottle, car, chair, mug cup, desk lamp, lamp, monitor, phone and plane, 15 models / class) as a training dataset for HSD inference (Fig. VII). For every 3D model, we generated ten range images at random view points, and they are completely unlabeled. All the HSDs used for our experiments were trained from these 1350 synthetic range images through the process described in Sect. IV. The hyperparameters α, β, γ of Dirichlet distributions were always set to 1, 0.01 and 1, respectively, and Gibbs sampling was run for 200 iterations. Various [L, T ] pairs were verified, and the results with ten HSDs learned with [3, 30] are only presented due to space limit. Figure 7. 3D models in training dataset Table I COMPARISON ON CLASSIFICATION PERFORMANCE (UNIT: %) Category naïve w\ PfN AVG SD AVG SD Bottle Car Chair Cup Desk lamp Lamp Monitor Phone Plane A. Evaluation on synthetic data We generated 450 range images (50 images for each class) using 3D models different from the training dataset, and validated our system on these images. The test range images were also generated at random views. Classification performance Given the test images, the first experiment shows a performance comparison between the naïve approach and ours. For extensive comparison, we ran two approaches on 168 different HSDs learned with various [L, T ] pairs. The resultant average and standard deviation of the recall rates are given in Table I, which demonstrates that our approach improves and stabilizes the classification performance. On average, the recall and precision rates are increased by 16% and 14%, respectively. Finally, the confusion matrix in Table II exhibits the classification performance of our system using PfN vocabulary. The average recall rate is 94%. Incremental learning performance We also validated our incremental learning method in terms of labeling performance. We first infer the initial HSD by applying the batch learning process to the randomly selected training images, and then learn the rest of the images using our online learning process. Let {a, b} be a pair of the number of the images used in initial batch learning (= a) and the number of the images used in online learning (= b). As the quality of the final HSD is highly related to shape homogeneity in each path, it is verified based on how well the naïve approach categorizes the test images. Fig. 8 shows the labeling performance of the HSDs trained by our online inference algorithm for different {a, b} pairs. Each bar represents a mean recall rate, and the stick overlapping on the bar indicates standard deviation. Computation time Our batch learning took 11.7 hrs on average. Our labeling process is extremely fast. It took 16

7 Table II CONFUSION MATRIX(UNIT: %)(L = 3, T = 30) Bottle Car Chair Cup Desk lamp Lamp Monitor Phone Plane Bottle 99.6 ± ± Car Chair 0 2 ± ± Cup Desk lamp 0.6 ± ± ± ± ± Lamp 4.2 ± ± ± ± ± Monitor 1 ± ± ± Phone ± ± ± ± ±9.72 Plane ± ± ±9.58 Figure 8. Online learning performance: red: {90,1260}, green: {450,900}, purple: {900,450} and blue: {1350,0} (a) SR3000 Figure 9. (b) Test objects Experimental environments 350 ms in our experiments. The system was tested on 3.0 GHz CPUs with 8GB of RAM. B. Evaluation on real range images Our scalable classification system is also validated on depth images acquired from the real-time range sensor, SwissRanger SR (Fig. 9(a)), which is a time-of-flight range sensor that produces a resolution image in real-time at 25 fps. Due to its low resolution, our test objects (Fig. 9(b)) should be small enough to be on the plate and captured very close to the sensor. For every image, we first segment object candidates (red, Fig. 10) by identifying the supporting planar surface (yellow), and then recognize its label using our method. Then, the path assignment process is applied to identify a new path that the object will be assigned to (if the object is classified as a new class) or to verify whether the object is a new instance of the existing class (if the object is classified as an existing class). The resultant paths are accumulated until the object is no longer visible in the scene. When the object disappears, the online learning process (Algorithm 1) inserts the object under the new path with the highest votes, if the object is classified as a new instance/class. 1 Fig. 10 shows an example of the HSD inferred after 1,433 depth images are processed. It demonstrates three different cases: the test object is classified as (1)an existing instance of the existing class (mug cup/paper cup/bottle/phone, black line), (2) a new instance of the existing class (espresso cup, blue line), (3) a new instance of a new class (duck, red line). When the duck is first shown in the scene, it is labeled as a new class and is learned into the HSD. Later, when it appears in the scene again, it is recognized as a duck (recall rate: ±0.03%). The confusion matrix on this dataset is given in Table III. The average correct classification rate is about 88.4% on the existing classes. The attached supplementary video displays the segmentation and identification results, and the computational time for every image. VIII. CONCLUSION We have presented a scalable framework that categorizes 3D objects in range images and expands to handle new data. For fast labeling and online inference, we employ a hierarchical model of object classes. Its tree structures and distributions are automatically inferred from given range images in an unsupervised manner. Our labeling approach using PfN visual vocabulary improves the performance, and the online inference process recognizes a path which corresponds to the new data and updates the part of the tree associated with the path. ACKNOWLEDGMENT This work is supported by DARPA under the URGENT program. The content of this paper is approved for public release, distribution unlimited. REFERENCES [1] T. Funkhouser, P. Min, M. Kazhdan, A. H. Joyce Chen, D. Dobkin, and D. Jacobs, A search engine for 3D models, ACM ToG, vol. 22, no. 1, pp , [2] A. E. Johnson and M. Hebert, Using spin images for efficient object recognition in cluttered 3D scenes, TPAMI, vol. 21, no. 5, pp , [3] A. S. Mian, M. Bennamoun, and R. Owens, 3-D modelbased object recognition and segmentation in cluttered scenes, TPAMI, vol. 28, no. 10, pp , 2006.

8 Figure 10. Example of the HSD: Each node represents the discrete probabilistic distribution π c (unit: 0.2) at each node c. Due to the limited space, we only show the paths the test objects belong to, and the paths which share the node with new paths and have the most training images among the branches. For every existing path, the example training object under the path is displayed in the purple box. Table III CONFUSION MATRIX WITH REAL RANGE IMAGES (UNIT: %) Bottle Car Chair Cup Desk Lamp Lamp Monitor Phone Plane New Mug Cup 4.2 ± ± Bottle Paper Cup 9.48 ± ± ±0.02 Espresso ± ±3.08 Duck ± ±0.04 Phone ± ± ± [4] B. Drost, M. Ulrich, N. Navab, and S. Ilic, Model globally, match locally: Efficient and robust 3d object recognition, in CVPR, 2010, pp [5] E. Bart, I. Porteous, P. Perona, and M. Welling, Unsupervised learning of visual taxonomies, in CVPR, 2008, pp [6] P. Daras and A. Axenopoulos, A 3D shape retrieval framework supporting multimodal queries, IJCV, vol. 89, no. 2, pp , [7] P. Papadakis, I. Pratikakis, T. Theoharis, and S. Perantonis, Panorama: A 3d shape descriptor based on panoramic views for unsupervised 3d object retrieval, IJCV, vol. 89, no. 2, pp , [8] F. Stein and G. Medioni, Structural indexing: efficient 3-D object recognition, PAMI, vol. 28, no. 10, pp , [9] S. Ruiz-Correa, L. G. Shapiro, and M. Meila, A new signature-based method for efficient 3D object recognition, in CVPR, [10] X. Li and I. Guskov, 3D object recognition from range images using pyramid matching, in ICCV, 2007, pp [11] D. Huber, A. Kapuria, R. Donamukkala, and M. Hebert, Parts-based 3D object classification, in CVPR, 2004, pp [12] S. Ruiz-Correa, L. G. Shapiro, and M. Meila, A new paradigm for recognizing 3D object shapes from range data, in ICCV, [13] A. Golovinskiy, V. G. Kim, and T. Funkhouser, Shape-based recognition of 3D point clouds in urban environment, in ICCV, [14] D. Murray and J. J. Little, Patchlets: Representing stereo vision data with surface elements, in WACV, [15] C. Wang, H. Tanahashi, H. Hirayu, Y. Niwa, and K. Yamamoto, Comparison of local plane fitting methods for range data, in CVPR, [16] G. Medioni, M.-S. Lee, and C.-K. Tang, A Computational Framework for Segmentation and Grouping. New York, NY, USA: Elsevier Science Inc., [17] A. Y. Ng and M. I. Jordan, On discriminative vs. generative classifiers: A comparison of logistic regression and naive bayes, in NIPS, [18] D. M. Blei, A. Y. Ng, and M. I. Jordan, Latent Dirichlet allocation, JMLR, vol. 3, pp , January [19] D. Blei, T. Griffiths, M. Jordan, and J. Tenenbaum, Hierarchical topic models and the nested Chinese restaurant process, in NIPS, 2004, pp [20] K. R. Canini, L. Shi, and T. L. Griffiths, Online inference of topics with latent Dirichlet allocation, in AISTATS, 2009.

Scalable Object Classification using Range Images

Scalable Object Classification using Range Images Scalable Object Classification using Range Images Eunyoung Kim and Gerard Medioni Institute for Robotics and Intelligent Systems University of Southern California 1 What is a Range Image? Depth measurement

More information

Robotics Programming Laboratory

Robotics Programming Laboratory Chair of Software Engineering Robotics Programming Laboratory Bertrand Meyer Jiwon Shin Lecture 8: Robot Perception Perception http://pascallin.ecs.soton.ac.uk/challenges/voc/databases.html#caltech car

More information

Spatial Latent Dirichlet Allocation

Spatial Latent Dirichlet Allocation Spatial Latent Dirichlet Allocation Xiaogang Wang and Eric Grimson Computer Science and Computer Science and Artificial Intelligence Lab Massachusetts Tnstitute of Technology, Cambridge, MA, 02139, USA

More information

Segmentation and Tracking of Partial Planar Templates

Segmentation and Tracking of Partial Planar Templates Segmentation and Tracking of Partial Planar Templates Abdelsalam Masoud William Hoff Colorado School of Mines Colorado School of Mines Golden, CO 800 Golden, CO 800 amasoud@mines.edu whoff@mines.edu Abstract

More information

Learning and Inferring Depth from Monocular Images. Jiyan Pan April 1, 2009

Learning and Inferring Depth from Monocular Images. Jiyan Pan April 1, 2009 Learning and Inferring Depth from Monocular Images Jiyan Pan April 1, 2009 Traditional ways of inferring depth Binocular disparity Structure from motion Defocus Given a single monocular image, how to infer

More information

Robot localization method based on visual features and their geometric relationship

Robot localization method based on visual features and their geometric relationship , pp.46-50 http://dx.doi.org/10.14257/astl.2015.85.11 Robot localization method based on visual features and their geometric relationship Sangyun Lee 1, Changkyung Eem 2, and Hyunki Hong 3 1 Department

More information

COSC160: Detection and Classification. Jeremy Bolton, PhD Assistant Teaching Professor

COSC160: Detection and Classification. Jeremy Bolton, PhD Assistant Teaching Professor COSC160: Detection and Classification Jeremy Bolton, PhD Assistant Teaching Professor Outline I. Problem I. Strategies II. Features for training III. Using spatial information? IV. Reducing dimensionality

More information

Introduction to SLAM Part II. Paul Robertson

Introduction to SLAM Part II. Paul Robertson Introduction to SLAM Part II Paul Robertson Localization Review Tracking, Global Localization, Kidnapping Problem. Kalman Filter Quadratic Linear (unless EKF) SLAM Loop closing Scaling: Partition space

More information

Beyond Bags of Features

Beyond Bags of Features : for Recognizing Natural Scene Categories Matching and Modeling Seminar Instructed by Prof. Haim J. Wolfson School of Computer Science Tel Aviv University December 9 th, 2015

More information

Analysis: TextonBoost and Semantic Texton Forests. Daniel Munoz Februrary 9, 2009

Analysis: TextonBoost and Semantic Texton Forests. Daniel Munoz Februrary 9, 2009 Analysis: TextonBoost and Semantic Texton Forests Daniel Munoz 16-721 Februrary 9, 2009 Papers [shotton-eccv-06] J. Shotton, J. Winn, C. Rother, A. Criminisi, TextonBoost: Joint Appearance, Shape and Context

More information

CS 468 Data-driven Shape Analysis. Shape Descriptors

CS 468 Data-driven Shape Analysis. Shape Descriptors CS 468 Data-driven Shape Analysis Shape Descriptors April 1, 2014 What Is A Shape Descriptor? Shapes Shape Descriptor F1=[f1, f2,.., fn] F2=[f1, f2,.., fn] F3=[f1, f2,.., fn] What Is A Shape Descriptor?

More information

Supplementary Material for Ensemble Diffusion for Retrieval

Supplementary Material for Ensemble Diffusion for Retrieval Supplementary Material for Ensemble Diffusion for Retrieval Song Bai 1, Zhichao Zhou 1, Jingdong Wang, Xiang Bai 1, Longin Jan Latecki 3, Qi Tian 4 1 Huazhong University of Science and Technology, Microsoft

More information

Data-driven Depth Inference from a Single Still Image

Data-driven Depth Inference from a Single Still Image Data-driven Depth Inference from a Single Still Image Kyunghee Kim Computer Science Department Stanford University kyunghee.kim@stanford.edu Abstract Given an indoor image, how to recover its depth information

More information

High-Level Computer Vision

High-Level Computer Vision High-Level Computer Vision Detection of classes of objects (faces, motorbikes, trees, cheetahs) in images Recognition of specific objects such as George Bush or machine part #45732 Classification of images

More information

Previously. Part-based and local feature models for generic object recognition. Bag-of-words model 4/20/2011

Previously. Part-based and local feature models for generic object recognition. Bag-of-words model 4/20/2011 Previously Part-based and local feature models for generic object recognition Wed, April 20 UT-Austin Discriminative classifiers Boosting Nearest neighbors Support vector machines Useful for object recognition

More information

Chapter 3 Image Registration. Chapter 3 Image Registration

Chapter 3 Image Registration. Chapter 3 Image Registration Chapter 3 Image Registration Distributed Algorithms for Introduction (1) Definition: Image Registration Input: 2 images of the same scene but taken from different perspectives Goal: Identify transformation

More information

Detecting Multiple Symmetries with Extended SIFT

Detecting Multiple Symmetries with Extended SIFT 1 Detecting Multiple Symmetries with Extended SIFT 2 3 Anonymous ACCV submission Paper ID 388 4 5 6 7 8 9 10 11 12 13 14 15 16 Abstract. This paper describes an effective method for detecting multiple

More information

CRF Based Point Cloud Segmentation Jonathan Nation

CRF Based Point Cloud Segmentation Jonathan Nation CRF Based Point Cloud Segmentation Jonathan Nation jsnation@stanford.edu 1. INTRODUCTION The goal of the project is to use the recently proposed fully connected conditional random field (CRF) model to

More information

Unsupervised Discovery of Object Classes from Range Data using Latent Dirichlet Allocation

Unsupervised Discovery of Object Classes from Range Data using Latent Dirichlet Allocation Robotics: Science and Systems 2009 Seattle, WA, USA, June 28 - July 1, 2009 Unsupervised Discovery of Obect Classes from Range Data using Latent Dirichlet Allocation Felix Endres Christian Plagemann Cyrill

More information

INFO0948 Fitting and Shape Matching

INFO0948 Fitting and Shape Matching INFO0948 Fitting and Shape Matching Renaud Detry University of Liège, Belgium Updated March 31, 2015 1 / 33 These slides are based on the following book: D. Forsyth and J. Ponce. Computer vision: a modern

More information

Motion Tracking and Event Understanding in Video Sequences

Motion Tracking and Event Understanding in Video Sequences Motion Tracking and Event Understanding in Video Sequences Isaac Cohen Elaine Kang, Jinman Kang Institute for Robotics and Intelligent Systems University of Southern California Los Angeles, CA Objectives!

More information

Unsupervised Learning of Visual Taxonomies

Unsupervised Learning of Visual Taxonomies Unsupervised Learning of Visual Taxonomies Evgeniy Bart Caltech Pasadena, CA 91125 bart@caltech.edu Ian Porteous UC Irvine Irvine, CA 92697 iporteou@ics.uci.edu Pietro Perona Caltech Pasadena, CA 91125

More information

Matching and Recognition in 3D. Based on slides by Tom Funkhouser and Misha Kazhdan

Matching and Recognition in 3D. Based on slides by Tom Funkhouser and Misha Kazhdan Matching and Recognition in 3D Based on slides by Tom Funkhouser and Misha Kazhdan From 2D to 3D: Some Things Easier No occlusion (but sometimes missing data instead) Segmenting objects often simpler From

More information

3D Keypoints Detection for Objects Recognition

3D Keypoints Detection for Objects Recognition 3D Keypoints Detection for Objects Recognition Ayet Shaiek 1, and Fabien Moutarde 1 1 Robotics laboratory (CAOR) Mines ParisTech 60 Bd St Michel, F-75006 Paris, France Abstract - In this paper, we propose

More information

Exploiting Depth Camera for 3D Spatial Relationship Interpretation

Exploiting Depth Camera for 3D Spatial Relationship Interpretation Exploiting Depth Camera for 3D Spatial Relationship Interpretation Jun Ye Kien A. Hua Data Systems Group, University of Central Florida Mar 1, 2013 Jun Ye and Kien A. Hua (UCF) 3D directional spatial relationships

More information

A Novel Algorithm for Automatic 3D Model-based Free-form Object Recognition

A Novel Algorithm for Automatic 3D Model-based Free-form Object Recognition A Novel Algorithm for Automatic 3D Model-based Free-form Object Recognition Ajmal S. Mian, M. Bennamoun and R. A. Owens School of Computer Science and Software Engineering The University of Western Australia

More information

DETECTION AND ROBUST ESTIMATION OF CYLINDER FEATURES IN POINT CLOUDS INTRODUCTION

DETECTION AND ROBUST ESTIMATION OF CYLINDER FEATURES IN POINT CLOUDS INTRODUCTION DETECTION AND ROBUST ESTIMATION OF CYLINDER FEATURES IN POINT CLOUDS Yun-Ting Su James Bethel Geomatics Engineering School of Civil Engineering Purdue University 550 Stadium Mall Drive, West Lafayette,

More information

Human pose estimation using Active Shape Models

Human pose estimation using Active Shape Models Human pose estimation using Active Shape Models Changhyuk Jang and Keechul Jung Abstract Human pose estimation can be executed using Active Shape Models. The existing techniques for applying to human-body

More information

SCALE INVARIANT FEATURE TRANSFORM (SIFT)

SCALE INVARIANT FEATURE TRANSFORM (SIFT) 1 SCALE INVARIANT FEATURE TRANSFORM (SIFT) OUTLINE SIFT Background SIFT Extraction Application in Content Based Image Search Conclusion 2 SIFT BACKGROUND Scale-invariant feature transform SIFT: to detect

More information

Perceptual Grouping from Motion Cues Using Tensor Voting

Perceptual Grouping from Motion Cues Using Tensor Voting Perceptual Grouping from Motion Cues Using Tensor Voting 1. Research Team Project Leader: Graduate Students: Prof. Gérard Medioni, Computer Science Mircea Nicolescu, Changki Min 2. Statement of Project

More information

Object Recognition with Invariant Features

Object Recognition with Invariant Features Object Recognition with Invariant Features Definition: Identify objects or scenes and determine their pose and model parameters Applications Industrial automation and inspection Mobile robots, toys, user

More information

Separating Objects and Clutter in Indoor Scenes

Separating Objects and Clutter in Indoor Scenes Separating Objects and Clutter in Indoor Scenes Salman H. Khan School of Computer Science & Software Engineering, The University of Western Australia Co-authors: Xuming He, Mohammed Bennamoun, Ferdous

More information

Det De e t cting abnormal event n s Jaechul Kim

Det De e t cting abnormal event n s Jaechul Kim Detecting abnormal events Jaechul Kim Purpose Introduce general methodologies used in abnormality detection Deal with technical details of selected papers Abnormal events Easy to verify, but hard to describe

More information

Bipartite Graph Partitioning and Content-based Image Clustering

Bipartite Graph Partitioning and Content-based Image Clustering Bipartite Graph Partitioning and Content-based Image Clustering Guoping Qiu School of Computer Science The University of Nottingham qiu @ cs.nott.ac.uk Abstract This paper presents a method to model the

More information

DeepIM: Deep Iterative Matching for 6D Pose Estimation - Supplementary Material

DeepIM: Deep Iterative Matching for 6D Pose Estimation - Supplementary Material DeepIM: Deep Iterative Matching for 6D Pose Estimation - Supplementary Material Yi Li 1, Gu Wang 1, Xiangyang Ji 1, Yu Xiang 2, and Dieter Fox 2 1 Tsinghua University, BNRist 2 University of Washington

More information

Component-based Face Recognition with 3D Morphable Models

Component-based Face Recognition with 3D Morphable Models Component-based Face Recognition with 3D Morphable Models B. Weyrauch J. Huang benjamin.weyrauch@vitronic.com jenniferhuang@alum.mit.edu Center for Biological and Center for Biological and Computational

More information

Recognizing Deformable Shapes. Salvador Ruiz Correa Ph.D. UW EE

Recognizing Deformable Shapes. Salvador Ruiz Correa Ph.D. UW EE Recognizing Deformable Shapes Salvador Ruiz Correa Ph.D. UW EE Input 3-D Object Goal We are interested in developing algorithms for recognizing and classifying deformable object shapes from range data.

More information

Structured Light II. Thanks to Ronen Gvili, Szymon Rusinkiewicz and Maks Ovsjanikov

Structured Light II. Thanks to Ronen Gvili, Szymon Rusinkiewicz and Maks Ovsjanikov Structured Light II Johannes Köhler Johannes.koehler@dfki.de Thanks to Ronen Gvili, Szymon Rusinkiewicz and Maks Ovsjanikov Introduction Previous lecture: Structured Light I Active Scanning Camera/emitter

More information

Feature LDA: a Supervised Topic Model for Automatic Detection of Web API Documentations from the Web

Feature LDA: a Supervised Topic Model for Automatic Detection of Web API Documentations from the Web Feature LDA: a Supervised Topic Model for Automatic Detection of Web API Documentations from the Web Chenghua Lin, Yulan He, Carlos Pedrinaci, and John Domingue Knowledge Media Institute, The Open University

More information

A Shape Aware Model for semi-supervised Learning of Objects and its Context

A Shape Aware Model for semi-supervised Learning of Objects and its Context A Shape Aware Model for semi-supervised Learning of Objects and its Context Abhinav Gupta 1, Jianbo Shi 2 and Larry S. Davis 1 1 Dept. of Computer Science, Univ. of Maryland, College Park 2 Dept. of Computer

More information

3D Models and Matching

3D Models and Matching 3D Models and Matching representations for 3D object models particular matching techniques alignment-based systems appearance-based systems GC model of a screwdriver 1 3D Models Many different representations

More information

Object of interest discovery in video sequences

Object of interest discovery in video sequences Object of interest discovery in video sequences A Design Project Report Presented to Engineering Division of the Graduate School Of Cornell University In Partial Fulfillment of the Requirements for the

More information

arxiv: v1 [cs.cv] 28 Sep 2018

arxiv: v1 [cs.cv] 28 Sep 2018 Camera Pose Estimation from Sequence of Calibrated Images arxiv:1809.11066v1 [cs.cv] 28 Sep 2018 Jacek Komorowski 1 and Przemyslaw Rokita 2 1 Maria Curie-Sklodowska University, Institute of Computer Science,

More information

Part based models for recognition. Kristen Grauman

Part based models for recognition. Kristen Grauman Part based models for recognition Kristen Grauman UT Austin Limitations of window-based models Not all objects are box-shaped Assuming specific 2d view of object Local components themselves do not necessarily

More information

3D Models and Matching

3D Models and Matching 3D Models and Matching representations for 3D object models particular matching techniques alignment-based systems appearance-based systems GC model of a screwdriver 1 3D Models Many different representations

More information

The SIFT (Scale Invariant Feature

The SIFT (Scale Invariant Feature The SIFT (Scale Invariant Feature Transform) Detector and Descriptor developed by David Lowe University of British Columbia Initial paper ICCV 1999 Newer journal paper IJCV 2004 Review: Matt Brown s Canonical

More information

A Novel Extreme Point Selection Algorithm in SIFT

A Novel Extreme Point Selection Algorithm in SIFT A Novel Extreme Point Selection Algorithm in SIFT Ding Zuchun School of Electronic and Communication, South China University of Technolog Guangzhou, China zucding@gmail.com Abstract. This paper proposes

More information

Part-based and local feature models for generic object recognition

Part-based and local feature models for generic object recognition Part-based and local feature models for generic object recognition May 28 th, 2015 Yong Jae Lee UC Davis Announcements PS2 grades up on SmartSite PS2 stats: Mean: 80.15 Standard Dev: 22.77 Vote on piazza

More information

Bridging the Gap Between Local and Global Approaches for 3D Object Recognition. Isma Hadji G. N. DeSouza

Bridging the Gap Between Local and Global Approaches for 3D Object Recognition. Isma Hadji G. N. DeSouza Bridging the Gap Between Local and Global Approaches for 3D Object Recognition Isma Hadji G. N. DeSouza Outline Introduction Motivation Proposed Methods: 1. LEFT keypoint Detector 2. LGS Feature Descriptor

More information

Instance-level recognition part 2

Instance-level recognition part 2 Visual Recognition and Machine Learning Summer School Paris 2011 Instance-level recognition part 2 Josef Sivic http://www.di.ens.fr/~josef INRIA, WILLOW, ENS/INRIA/CNRS UMR 8548 Laboratoire d Informatique,

More information

A Learning Approach to 3D Object Representation for Classification

A Learning Approach to 3D Object Representation for Classification A Learning Approach to 3D Object Representation for Classification Indriyati Atmosukarto and Linda G. Shapiro University of Washington, Department of Computer Science and Engineering,Seattle, WA, USA {indria,shapiro}@cs.washington.edu

More information

Continuous Multi-View Tracking using Tensor Voting

Continuous Multi-View Tracking using Tensor Voting Continuous Multi-View Tracking using Tensor Voting Jinman Kang, Isaac Cohen and Gerard Medioni Institute for Robotics and Intelligent Systems University of Southern California {jinmanka, icohen, medioni}@iris.usc.edu

More information

Instance-level recognition

Instance-level recognition Instance-level recognition 1) Local invariant features 2) Matching and recognition with local features 3) Efficient visual search 4) Very large scale indexing Matching of descriptors Matching and 3D reconstruction

More information

Specular 3D Object Tracking by View Generative Learning

Specular 3D Object Tracking by View Generative Learning Specular 3D Object Tracking by View Generative Learning Yukiko Shinozuka, Francois de Sorbier and Hideo Saito Keio University 3-14-1 Hiyoshi, Kohoku-ku 223-8522 Yokohama, Japan shinozuka@hvrl.ics.keio.ac.jp

More information

Reconstruction of Polygonal Faces from Large-Scale Point-Clouds of Engineering Plants

Reconstruction of Polygonal Faces from Large-Scale Point-Clouds of Engineering Plants 1 Reconstruction of Polygonal Faces from Large-Scale Point-Clouds of Engineering Plants Hiroshi Masuda 1, Takeru Niwa 2, Ichiro Tanaka 3 and Ryo Matsuoka 4 1 The University of Electro-Communications, h.masuda@euc.ac.jp

More information

Selection of Scale-Invariant Parts for Object Class Recognition

Selection of Scale-Invariant Parts for Object Class Recognition Selection of Scale-Invariant Parts for Object Class Recognition Gy. Dorkó and C. Schmid INRIA Rhône-Alpes, GRAVIR-CNRS 655, av. de l Europe, 3833 Montbonnot, France fdorko,schmidg@inrialpes.fr Abstract

More information

Ensemble of Bayesian Filters for Loop Closure Detection

Ensemble of Bayesian Filters for Loop Closure Detection Ensemble of Bayesian Filters for Loop Closure Detection Mohammad Omar Salameh, Azizi Abdullah, Shahnorbanun Sahran Pattern Recognition Research Group Center for Artificial Intelligence Faculty of Information

More information

Instance-level recognition

Instance-level recognition Instance-level recognition 1) Local invariant features 2) Matching and recognition with local features 3) Efficient visual search 4) Very large scale indexing Matching of descriptors Matching and 3D reconstruction

More information

3D object recognition used by team robotto

3D object recognition used by team robotto 3D object recognition used by team robotto Workshop Juliane Hoebel February 1, 2016 Faculty of Computer Science, Otto-von-Guericke University Magdeburg Content 1. Introduction 2. Depth sensor 3. 3D object

More information

3D Photography: Stereo

3D Photography: Stereo 3D Photography: Stereo Marc Pollefeys, Torsten Sattler Spring 2016 http://www.cvg.ethz.ch/teaching/3dvision/ 3D Modeling with Depth Sensors Today s class Obtaining depth maps / range images unstructured

More information

Online Pattern Recognition in Multivariate Data Streams using Unsupervised Learning

Online Pattern Recognition in Multivariate Data Streams using Unsupervised Learning Online Pattern Recognition in Multivariate Data Streams using Unsupervised Learning Devina Desai ddevina1@csee.umbc.edu Tim Oates oates@csee.umbc.edu Vishal Shanbhag vshan1@csee.umbc.edu Machine Learning

More information

Object Classification Problem

Object Classification Problem HIERARCHICAL OBJECT CATEGORIZATION" Gregory Griffin and Pietro Perona. Learning and Using Taxonomies For Fast Visual Categorization. CVPR 2008 Marcin Marszalek and Cordelia Schmid. Constructing Category

More information

Collective Classification for Labeling of Places and Objects in 2D and 3D Range Data

Collective Classification for Labeling of Places and Objects in 2D and 3D Range Data Collective Classification for Labeling of Places and Objects in 2D and 3D Range Data Rudolph Triebel 1, Óscar Martínez Mozos2, and Wolfram Burgard 2 1 Autonomous Systems Lab, ETH Zürich, Switzerland rudolph.triebel@mavt.ethz.ch

More information

HIGH spatial resolution Earth Observation (EO) images

HIGH spatial resolution Earth Observation (EO) images JOURNAL OF L A TEX CLASS FILES, VOL. 6, NO., JANUARY 7 A Comparative Study of Bag-of-Words and Bag-of-Topics Models of EO Image Patches Reza Bahmanyar, Shiyong Cui, and Mihai Datcu, Fellow, IEEE Abstract

More information

Recognizing Deformable Shapes. Salvador Ruiz Correa (CSE/EE576 Computer Vision I)

Recognizing Deformable Shapes. Salvador Ruiz Correa (CSE/EE576 Computer Vision I) Recognizing Deformable Shapes Salvador Ruiz Correa (CSE/EE576 Computer Vision I) Input 3-D Object Goal We are interested in developing algorithms for recognizing and classifying deformable object shapes

More information

Combining Selective Search Segmentation and Random Forest for Image Classification

Combining Selective Search Segmentation and Random Forest for Image Classification Combining Selective Search Segmentation and Random Forest for Image Classification Gediminas Bertasius November 24, 2013 1 Problem Statement Random Forest algorithm have been successfully used in many

More information

Improving Recognition through Object Sub-categorization

Improving Recognition through Object Sub-categorization Improving Recognition through Object Sub-categorization Al Mansur and Yoshinori Kuno Graduate School of Science and Engineering, Saitama University, 255 Shimo-Okubo, Sakura-ku, Saitama-shi, Saitama 338-8570,

More information

CSE 527: Introduction to Computer Vision

CSE 527: Introduction to Computer Vision CSE 527: Introduction to Computer Vision Week 5 - Class 1: Matching, Stitching, Registration September 26th, 2017 ??? Recap Today Feature Matching Image Alignment Panoramas HW2! Feature Matches Feature

More information

Background subtraction in people detection framework for RGB-D cameras

Background subtraction in people detection framework for RGB-D cameras Background subtraction in people detection framework for RGB-D cameras Anh-Tuan Nghiem, Francois Bremond INRIA-Sophia Antipolis 2004 Route des Lucioles, 06902 Valbonne, France nghiemtuan@gmail.com, Francois.Bremond@inria.fr

More information

Accurate 3D Face and Body Modeling from a Single Fixed Kinect

Accurate 3D Face and Body Modeling from a Single Fixed Kinect Accurate 3D Face and Body Modeling from a Single Fixed Kinect Ruizhe Wang*, Matthias Hernandez*, Jongmoo Choi, Gérard Medioni Computer Vision Lab, IRIS University of Southern California Abstract In this

More information

Definition, Detection, and Evaluation of Meeting Events in Airport Surveillance Videos

Definition, Detection, and Evaluation of Meeting Events in Airport Surveillance Videos Definition, Detection, and Evaluation of Meeting Events in Airport Surveillance Videos Sung Chun Lee, Chang Huang, and Ram Nevatia University of Southern California, Los Angeles, CA 90089, USA sungchun@usc.edu,

More information

Topological Mapping. Discrete Bayes Filter

Topological Mapping. Discrete Bayes Filter Topological Mapping Discrete Bayes Filter Vision Based Localization Given a image(s) acquired by moving camera determine the robot s location and pose? Towards localization without odometry What can be

More information

Lecture 12 Recognition

Lecture 12 Recognition Institute of Informatics Institute of Neuroinformatics Lecture 12 Recognition Davide Scaramuzza 1 Lab exercise today replaced by Deep Learning Tutorial Room ETH HG E 1.1 from 13:15 to 15:00 Optional lab

More information

3D Photography: Active Ranging, Structured Light, ICP

3D Photography: Active Ranging, Structured Light, ICP 3D Photography: Active Ranging, Structured Light, ICP Kalin Kolev, Marc Pollefeys Spring 2013 http://cvg.ethz.ch/teaching/2013spring/3dphoto/ Schedule (tentative) Feb 18 Feb 25 Mar 4 Mar 11 Mar 18 Mar

More information

Factorization with Missing and Noisy Data

Factorization with Missing and Noisy Data Factorization with Missing and Noisy Data Carme Julià, Angel Sappa, Felipe Lumbreras, Joan Serrat, and Antonio López Computer Vision Center and Computer Science Department, Universitat Autònoma de Barcelona,

More information

CS 231A Computer Vision (Fall 2012) Problem Set 3

CS 231A Computer Vision (Fall 2012) Problem Set 3 CS 231A Computer Vision (Fall 2012) Problem Set 3 Due: Nov. 13 th, 2012 (2:15pm) 1 Probabilistic Recursion for Tracking (20 points) In this problem you will derive a method for tracking a point of interest

More information

Unsupervised discovery of category and object models. The task

Unsupervised discovery of category and object models. The task Unsupervised discovery of category and object models Martial Hebert The task 1 Common ingredients 1. Generate candidate segments 2. Estimate similarity between candidate segments 3. Prune resulting (implicit)

More information

Short Survey on Static Hand Gesture Recognition

Short Survey on Static Hand Gesture Recognition Short Survey on Static Hand Gesture Recognition Huu-Hung Huynh University of Science and Technology The University of Danang, Vietnam Duc-Hoang Vo University of Science and Technology The University of

More information

Liangjie Hong*, Dawei Yin*, Jian Guo, Brian D. Davison*

Liangjie Hong*, Dawei Yin*, Jian Guo, Brian D. Davison* Tracking Trends: Incorporating Term Volume into Temporal Topic Models Liangjie Hong*, Dawei Yin*, Jian Guo, Brian D. Davison* Dept. of Computer Science and Engineering, Lehigh University, Bethlehem, PA,

More information

CSE 252B: Computer Vision II

CSE 252B: Computer Vision II CSE 252B: Computer Vision II Lecturer: Serge Belongie Scribes: Jeremy Pollock and Neil Alldrin LECTURE 14 Robust Feature Matching 14.1. Introduction Last lecture we learned how to find interest points

More information

Features Points. Andrea Torsello DAIS Università Ca Foscari via Torino 155, Mestre (VE)

Features Points. Andrea Torsello DAIS Università Ca Foscari via Torino 155, Mestre (VE) Features Points Andrea Torsello DAIS Università Ca Foscari via Torino 155, 30172 Mestre (VE) Finding Corners Edge detectors perform poorly at corners. Corners provide repeatable points for matching, so

More information

MMM-classification of 3D Range Data

MMM-classification of 3D Range Data MMM-classification of 3D Range Data Anuraag Agrawal, Atsushi Nakazawa, and Haruo Takemura Abstract This paper presents a method for accurately segmenting and classifying 3D range data into particular object

More information

Continuous Multi-Views Tracking using Tensor Voting

Continuous Multi-Views Tracking using Tensor Voting Continuous Multi-Views racking using ensor Voting Jinman Kang, Isaac Cohen and Gerard Medioni Institute for Robotics and Intelligent Systems University of Southern California Los Angeles, CA 90089-073.

More information

Seminar Heidelberg University

Seminar Heidelberg University Seminar Heidelberg University Mobile Human Detection Systems Pedestrian Detection by Stereo Vision on Mobile Robots Philip Mayer Matrikelnummer: 3300646 Motivation Fig.1: Pedestrians Within Bounding Box

More information

Classifying Images with Visual/Textual Cues. By Steven Kappes and Yan Cao

Classifying Images with Visual/Textual Cues. By Steven Kappes and Yan Cao Classifying Images with Visual/Textual Cues By Steven Kappes and Yan Cao Motivation Image search Building large sets of classified images Robotics Background Object recognition is unsolved Deformable shaped

More information

CS 223B Computer Vision Problem Set 3

CS 223B Computer Vision Problem Set 3 CS 223B Computer Vision Problem Set 3 Due: Feb. 22 nd, 2011 1 Probabilistic Recursion for Tracking In this problem you will derive a method for tracking a point of interest through a sequence of images.

More information

Unsupervised Learning of Visual Taxonomies

Unsupervised Learning of Visual Taxonomies Unsupervised Learning of Visual Taxonomies Evgeniy Bart Caltech Pasadena, CA 91125 bart@caltech.edu Ian Porteous UC Irvine Irvine, CA 92697 iporteou@ics.uci.edu Pietro Perona Caltech Pasadena, CA 91125

More information

SUMMARY: DISTINCTIVE IMAGE FEATURES FROM SCALE- INVARIANT KEYPOINTS

SUMMARY: DISTINCTIVE IMAGE FEATURES FROM SCALE- INVARIANT KEYPOINTS SUMMARY: DISTINCTIVE IMAGE FEATURES FROM SCALE- INVARIANT KEYPOINTS Cognitive Robotics Original: David G. Lowe, 004 Summary: Coen van Leeuwen, s1460919 Abstract: This article presents a method to extract

More information

Image Segmentation Using Iterated Graph Cuts Based on Multi-scale Smoothing

Image Segmentation Using Iterated Graph Cuts Based on Multi-scale Smoothing Image Segmentation Using Iterated Graph Cuts Based on Multi-scale Smoothing Tomoyuki Nagahashi 1, Hironobu Fujiyoshi 1, and Takeo Kanade 2 1 Dept. of Computer Science, Chubu University. Matsumoto 1200,

More information

Perception. Autonomous Mobile Robots. Sensors Vision Uncertainties, Line extraction from laser scans. Autonomous Systems Lab. Zürich.

Perception. Autonomous Mobile Robots. Sensors Vision Uncertainties, Line extraction from laser scans. Autonomous Systems Lab. Zürich. Autonomous Mobile Robots Localization "Position" Global Map Cognition Environment Model Local Map Path Perception Real World Environment Motion Control Perception Sensors Vision Uncertainties, Line extraction

More information

A System of Image Matching and 3D Reconstruction

A System of Image Matching and 3D Reconstruction A System of Image Matching and 3D Reconstruction CS231A Project Report 1. Introduction Xianfeng Rui Given thousands of unordered images of photos with a variety of scenes in your gallery, you will find

More information

Map-Enhanced UAV Image Sequence Registration and Synchronization of Multiple Image Sequences

Map-Enhanced UAV Image Sequence Registration and Synchronization of Multiple Image Sequences Map-Enhanced UAV Image Sequence Registration and Synchronization of Multiple Image Sequences Yuping Lin and Gérard Medioni Computer Science Department, University of Southern California 941 W. 37th Place,

More information

3D Computer Vision. Structured Light II. Prof. Didier Stricker. Kaiserlautern University.

3D Computer Vision. Structured Light II. Prof. Didier Stricker. Kaiserlautern University. 3D Computer Vision Structured Light II Prof. Didier Stricker Kaiserlautern University http://ags.cs.uni-kl.de/ DFKI Deutsches Forschungszentrum für Künstliche Intelligenz http://av.dfki.de 1 Introduction

More information

Unsupervised Identification of Multiple Objects of Interest from Multiple Images: discover

Unsupervised Identification of Multiple Objects of Interest from Multiple Images: discover Unsupervised Identification of Multiple Objects of Interest from Multiple Images: discover Devi Parikh and Tsuhan Chen Carnegie Mellon University {dparikh,tsuhan}@cmu.edu Abstract. Given a collection of

More information

STRUCTURAL EDGE LEARNING FOR 3-D RECONSTRUCTION FROM A SINGLE STILL IMAGE. Nan Hu. Stanford University Electrical Engineering

STRUCTURAL EDGE LEARNING FOR 3-D RECONSTRUCTION FROM A SINGLE STILL IMAGE. Nan Hu. Stanford University Electrical Engineering STRUCTURAL EDGE LEARNING FOR 3-D RECONSTRUCTION FROM A SINGLE STILL IMAGE Nan Hu Stanford University Electrical Engineering nanhu@stanford.edu ABSTRACT Learning 3-D scene structure from a single still

More information

One-Shot Learning with a Hierarchical Nonparametric Bayesian Model

One-Shot Learning with a Hierarchical Nonparametric Bayesian Model One-Shot Learning with a Hierarchical Nonparametric Bayesian Model R. Salakhutdinov, J. Tenenbaum and A. Torralba MIT Technical Report, 2010 Presented by Esther Salazar Duke University June 10, 2011 E.

More information

Uncertainty-Driven 6D Pose Estimation of Objects and Scenes from a Single RGB Image - Supplementary Material -

Uncertainty-Driven 6D Pose Estimation of Objects and Scenes from a Single RGB Image - Supplementary Material - Uncertainty-Driven 6D Pose Estimation of s and Scenes from a Single RGB Image - Supplementary Material - Eric Brachmann*, Frank Michel, Alexander Krull, Michael Ying Yang, Stefan Gumhold, Carsten Rother

More information

Sketchable Histograms of Oriented Gradients for Object Detection

Sketchable Histograms of Oriented Gradients for Object Detection Sketchable Histograms of Oriented Gradients for Object Detection No Author Given No Institute Given Abstract. In this paper we investigate a new representation approach for visual object recognition. The

More information

Multi-View 3D Object Detection Network for Autonomous Driving

Multi-View 3D Object Detection Network for Autonomous Driving Multi-View 3D Object Detection Network for Autonomous Driving Xiaozhi Chen, Huimin Ma, Ji Wan, Bo Li, Tian Xia CVPR 2017 (Spotlight) Presented By: Jason Ku Overview Motivation Dataset Network Architecture

More information

The Caltech-UCSD Birds Dataset

The Caltech-UCSD Birds Dataset The Caltech-UCSD Birds-200-2011 Dataset Catherine Wah 1, Steve Branson 1, Peter Welinder 2, Pietro Perona 2, Serge Belongie 1 1 University of California, San Diego 2 California Institute of Technology

More information