Efficient Unsupervised Learning for Localization and Detection in Object Categories

Size: px
Start display at page:

Download "Efficient Unsupervised Learning for Localization and Detection in Object Categories"

Transcription

1 Efficient Unsupervised Learning for Localization and Detection in Object ategories Nicolas Loeff, Himanshu Arora EE Department University of Illinois at Urbana-hampaign Alexander Sorokin, David Forsyth omputer Science Department University of Illinois at Urbana-hampaign Abstract We describe a novel method for learning templates for recognition and localization of objects drawn from categories. A generative model represents the configuration of multiple object parts with respect to an object coordinate system; these parts in turn generate image features. The complexity of the model in the number of features is low, meaning our model is much more efficient to train than comparative methods. Moreover, a variational approximation is introduced that allows learning to be orders of magnitude faster than previous approaches while incorporating many more features. This results in both accuracy and localization improvements. Our model has been carefully tested on standard datasets; we compare with a number of recent template models. In particular, we demonstrate state-of-the-art results for detection and localization. 1 Introduction Building appropriate object models is central to object recognition, which is a fundamental problem in computer vision. Desirable characteristics of a model include good representation of objects, fast and efficient learning algorithms that require as little supervised information as possible. We believe an appropriate representation of an object should allow for both detection of its presence and localization ( where is it? ). So far the quality of object recognition in the literature has been measured by its detection performance only. Viola and Jones [1] present a fast object detection system boosting Haar filter responses. Another effective discriminative approach is that of a bag of keypoints [2, 3]. It is based on clustering image patches using appearance only, disregarding geometric information. The performance for detection in this algorithm is among the state of the art. However as no geometry cues are used during training, features that do not belong to the object can be incorporated into the object model. This is similar to classic overfitting and typically leads to problems in object localization. Weber et. al. [4] represent an object as a constellation of parts. Fergus et. al. [5] extend the model to account for variability in appearance. The model encodes a template as a set of feature-generating parts. Each part generates at most one feature. As a result the complexity is determined by hardness of part-feature assignment. Heuristic search is used to approximate the solution, but feasible problems are limited to 7 parts with 30 features.

2 Agarwal and Roth [6] learn using SNoW a classifier on a sparse representation of patches extracted around interesting points in the image. In [7], Leibe and Schiele use a voting scheme to predict object configuration from locations of individual patches. Both approaches provide localization, but require manually localizing the objects in training images. Hillel et. al. [8] independently proposed an approach similar to ours. Their model however has higher learning complexity and inferior detection performance despite being of discriminative nature. In this paper, we present a generative probabilistic model for detection and localization of objects that can be efficiently learnt with minimal supervision. The first crucial property of the model is that it represents the configuration of multiple object parts with respect to an unobserved, abstract object root (unlike [9, 10], where an object root is chosen as one of the visible parts of the object). This simplifies localization and allows our model to overcome occlusion and errors in feature extraction. The model also becomes symmetric with respect to visible parts. The second crucial assumption of the model is that a single part can generate multiple features in the image (or none). This may seem counterintuitive, but keypoint detectors generally detects several features around interesting areas. This hypothesis also makes an explicit model for part occlusion unnecessary: instead occlusion of a part means implicitly that no feature in the image is produced by it. These assumptions allow us to model all features in the image as being emitted independently conditioned on the object center. As a result the complexity of inference in our model is linear in the number of parts of the model and the number of features in the image, obviating the exponential complexity of combinatoric assignments in other approaches [4, 5, 11]. This means our model is much easier than constellation models to train using Expectation Maximization (EM), which enables the use of more features and more complex models with resulting improvements in both accuracy and localization. Furthermore we introduce a variational (mean-field) approximation during learning that allows it to be hundreds of times faster than previous approaches, with no substantial loss of accuracy. 2 Model Our model of an object category is a template that generates features in the image. Each image is represented as a set {f j } of F features extracted with the scale-saliency point detector [13]. Each feature is described by its location and appearance. Feature extraction and representation will be detailed in section 3. As described in the introduction, we hypothesize that given the object center all features are generated independently: p obj (f 1,..,f F ) = o c P(o c ) j p(f j o c ). The abstract object center - which does not generate any features - is represented by a hidden random variable o c. For simplicity it takes values in a discrete grid of size N x N y inside the image and o c is assumed to be a priori uniformly distributed in its domain. onditioned on the object center, each feature is generated by a mixture of P parts plus a background part. A set of hidden variables {ω ij } represents which part (i) produced feature f j. These variables ω ij then take values {0,1} restricted to P+1 i=1 ω ij = 1. In other words, ω ij = 1 means feature j was produced by part i; each part can produce multiple features, each feature is produced by only one part. The distribution of a feature conditioned on the object center is then p(f j o c ) = i p(f j,w ij = 1 o c ) = i p(f j w ij = 1,o c )π i, where π i is the prior emission probability of part i. π i is subject to P+1 i=1 π i = 1. Each part has a location distribution with respect to the object center corresponding to a two dimensional full covariance Gaussian, p i L (x o c). The appearance (see section 3 for details) of a part does not depend on the configuration of the object; we consider two models :

3 Gaussian Model (G) Appearance p i A is modeled as a k dimensional diagonal covariance Gaussian distribution. Local Topic Model () Appearance p i A is modeled as a multinomial distribution on a previously learnt k-word image patch dictionary. This can be considered as a local topic model. Let θ denote the set of parameters. The complete data likelihood (joint distribution) for image n in the object model is then, P obj θ ({ω ij },o c, {f j }) = [o c=o { p i L (f j o c)p i } c ] [ωij=1] A(f j )π i P(o c ) (1) o c j,i where [expr] is one if expr is true and zero otherwise. Marginalizing, the probability of the observed image in the object model is then, P obj θ ({f j }) = o c P(o c ) j { } P(f j,ω ij = 1 o c ) The background model assumes all features are produced independently, with uniform location on the image. In the G model of appearance, the appearance is modeled with a k dimensional full covariance matrix Gaussian distribution. In the model, we use a multinomial distribution on the k-word image patch dictionary to model the appearance. 2.1 Learning The maximum-likelihood solution for the parameters of the above model does not have a closed form. In order to train the model the parameters are computed numerically using the approach of [14], minimizing a free-energy F e associated with the model that is an upper bound on the negative log-likelihood. Following [14], we denote v = {f j } as the set of visible and h = {o c,ω ij } as the set of hidden variables. Let D KL be the K-L divergence: { F e (Q,θ) = D KL Q(h) Pθ (h v) } log P θ (v) = i h (2) Q(h)log Q(h) dh (3) P θ (h,v) In this bound, Q(h) can be a simpler approximation of the posterior probability P θ (h v), that is used to compute estimates and update parameters. Minimizing eq. 3 with respect to Q and θ under different restrictions, produces a range of algorithms including exact EM, variational learning and others [14]. Table 2.1 shows sample updates and complexity of these algorithms and comparison to other relevant work. The background model is learnt before the object model is trained. As assumed earlier, for Gaussian appearance model the background appearance model is a single gaussian, whose mean and variance are estimated as the sample mean and covariance. For the Local Topic model, the multinomial distribution is estimated as the sample histogram. The model for background feature location is uniform and does not have any parameters. EM Learning for the Object model: In the E-step, the set of parameters θ is fixed and F e is minimized with respect to Q(h) without restrictions. This is equivalent to computing the actual posteriors in EM [14, 15]. In this case the optimal solution factorizes as Q(h) = Q(o c )Q(ω ij o c ) = P(o c v)p(ω ij o c,v). In the M-step, F e is minimized with respect to the parameters θ using the current estimate of Q. Due to the conditional independence introduced in the model, inference is tractable and thus the E-step can be computed efficiently. The overall complexity of inference is O(FP N x N y ).

4 Model Update for µ i L omplexity Time (F,P) Fergus et al. N/A F P 36 hrs (30, 7) Model (EM) (Variational) µ i L µ i L P n P P P oc Q(oc) P j Q(ω ji o c){x j oc} L n n{ P j Q(ω ji)x j L P Q(oc)oc} oc P n Poc Q(oc) P j Q(ω ji o c) FP N xn y 3 hrs (50, 30) Poc Q(oc) P j Q(ω ji) FP + N xn y 3 mins (100, 30) Table 1: An example of an update, overall complexity and convergence time for our models and [5], for different number of features per image (F ) and number of parts in the object model (P ). There is an increase in speed of several orders of magnitude with respect to [5] on similar hardware. Variational Learning: In this approach a mean field approximation of Q is considered; in the E-step the parameters θ are fixed and F is minimized with respect to Q under the restriction that it factorizes as Q(h) = Q(o c )Q(w ij ). This corresponds to a decoupling of location (o c ) and part-feature assignment (w ij ) in the approximation (Q) of the posterior P θ (h v). In the M-step θ is fixed and the free energy F e is minimized with respect to this (mean field) version of Q. A comparison between EM and Variational updates of the mean in location µ i L of a part is shown in table 2.1. The overall complexity of inference is now O(FP) + O(N x N y ); this represents orders of magnitude of speedup with respect to the already efficient EM learning. The impact on performance of the variational approximation is discussed in section Detection and localization For detection of object presence, a natural decision rule is the likelihood ratio test. After the models are learnt, for each test image P obj θ ({f j })/P bg ({f j }) is compared to a threshold to make the decision. Once the presence of the object is established, the most likely location is given by the MAP estimate of o c. We assign parts in the model to the object if they exhibit consistent appearance and location. To remove model parts representing background we use a threshold on the entropy of the appearance distribution for the model (the determinant of the covariance in location for the G model). The MAP estimate of which features in the image are assigned (marginalizing over the object center) to parts in the model determines the support of the object. Bounding boxes include all keypoints assigned to the object and means of all model parts belonging to the object even if no keypoint is observed to be produced by such part. This explicitly handles occlusion (fig. 1). 3 Experimental setup The performance of the method depends on the feature detector making consistent extraction in different instances of objects of the same type. We use the scale-saliency interest point detector proposed in [13]. This method selects regions exhibiting unpredictable characteristics over both location and scale. The F regions with highest saliency over the image provide the features for learning and recognition. After the keypoints are detected, patches are extracted around this points and scale-normalized. A SIFT descriptor [16] (without orientation) is obtained from these patches. For model G, due to the high dimensionality of resulting space, PA is performed choosing k = 15 components to represent the appearance of a feature. For model, we instead cluster the appearance of features in the original SIFT space with a gaussian mixture model with k = 250 components and use the most likely cluster as feature appearance representation. For all experiments we use P = 30 parts. The number of features is F = 50 for G model and F = 100 for model, N x N y = 238. We test our approach on the altech 5 dataset: faces, motorbikes, airplanes, spotted cats vs. altech background and cars rear 2001 vs. cars background [5]. We initialize appearance and location of the parts with P randomly chosen features from the training set. The stopping criterion is the change in F e.

5 Figure 1: Local Topic model for faces, motorbikes and airplanes datasets [5]. In (a) the most likely location of the object center is plotted as a black circle. With respect to this reference, the spatial distribution (2D gaussian) of each part associated with the object is plotted in green. In (b) the centers of all features extracted are depicted. Blue ones are assigned by the model to the object, and red ones to the background. The bounding box is plotted in blue. Image (c) shows how many features in the image are assigned to the same part (a property of our model, not shared by [5]): six parts are chosen, their spatial distribution is plotted (green), and the features assigned to them are depicted in blue. Eyes (4,5), mouth (3) and left ear (6) have multiple assignments each. For each these parts, image (d) image shows the best matches in features extracted from the dataset. Note that the local topic model can learn parts uniform in appearance (i.e. eyes) but also more complex parts (i.e. the mouth part includes moustaches, beards and chins). The G appearance model and [5] do not have this property. The images (e) show the robustness of the method in cases with occlusion, missed detections and one caricature of a face. Images (f) and (g) show plots for motorbikes, and (h) and (i) for airplanes.

6 4 Results Detection: Although we believe that localization is an essential performance criterion, it is useless if the approach cannot detect objects. Figure 2 depicts equal error rate detection performance for our models and [5, 3, 8]. We can not compare our range of performance (for train/test splits), shown on the plot, because this data is not available for other approaches. Our method is robust to initialization (the variance for starting points is negligible compared to train/test split variance). The results show higher detection performance of all our algorithms compared to the generative model presented in [5]. The local topic () model performs better than the model presented in [8]. The purely discriminative approach presented in [3] shows higher detection performance with different ( optimal combination ) features, but performs worse for the features we are using. The model showed consistently higher detection performance than the Gaussian (G) model. For both and G models the variational approximations showed similar discriminative power to that of the respective exact models. Unlike [5, 3], our model currently is not scale invariant. Nevertheless the probabilistic nature of the model allows for some tolerance to scale changes. In datasets of manageable size, it is inevitable that the background is correlated with the object. The result is that most modern methods that infer the template form partially supervised data can tend to model some background parts as lying on the object (see figure 4). Doing so tends to increase detection performance. It is reasonable to expect this increase will not persist in the face of a dramatic change in background. One symptom of this phenomenon (as in classical overfitting) is that methods that detect very well may be bad at localization, because they cannot separate the object from background. We are able to avoid this difficulty by predicting object extent conditioned on detection using only a subset of parts known to have relatively low variance in location or appearance, given the object center. We do not yet have an estimate of the increase in detection rate resulting from overfitting. This is a topic of ongoing research. In our opinion, if a method can detect but performs poorly at localization, the reason may be overfitting. Localization: Previous work on localization required aligned images (bounding boxes) or segmentation masks [7, 6]. A novel property of our model is that it learns to localize the object and determine its spatial extent without supervision. Figure 1 shows learned models and examples of localization. There is no standard measure to evaluate localization performance in an unsupervised setting. In such a case, the object center can be learnt at any position in the image, provided that this position is consistent across all images. We thus use as our performance measure, the standard deviation of estimated object centers and bounding boxes (obtained as in 2.2), after normalizing the estimates of each image to a coordinate system in which the ground truth bounding box is a unit square (0,0) (1,1). As a baseline we use the rectified center of the image. All objects of interest in both airplane and motorbike datasets are centered in the image. As a result the baseline is a good predictor of the object center and is hard to beat. However in the faces dataset there is much more variation in location; then the advantage of our approach becomes clear. Figure 3 shows the scatterplot of normalized object centers and bounding boxes. The table in figure 2 shows the localization performance results using the proposed metric. Variational approximation comparison: Unusually for a variational approximation it is possible to compare it to the exact model; the results are excellent especially for the G model. This is consistent with our observation that during learning the variational approximation is good in this case (the free energy bound appears tight). On the other hand, for the model, the variational bound is loose during learning and localization performance is equivalent, but slightly lower than that of exact model. This may be explained by the fact that gaussian appearance model is less flexible then the topic model and thus G model can better tolerate decoupling of location and appearance.

7 Airplanes G GV DL B 100 DLc Motorbikes 100 DLc G GV DL B Faces G GV DL B 100 DLc ars rear G GV B Spotted ats G GV DL DLc Model Bbox(%) Obj. center(%) vert horz vert horz Faces G GV BL Airplanes BL Motorbikes BL Figure 2: Plots on the left show detection performance on altech 5 datasets [5]. Equal error rate is reported. The original performance of constellation model [5] is denoted by. We denote by DLc the performance (best in literature) reported by [3] using an optimal combination of feature types, and by DL the performance using our features. The performance of [8] is denoted by B. We show performance for our G model (G), model (L) and their variational approximations (GV) and () respectively. We report median performance ( ) over 20 runs and performance range excluding 10% best and 10% worst runs. On the right we show localization performance for all models on Faces dataset and performance of the best model () on all datasets. Standard deviation is reported in percentage units with respect to the ground truth bounding box. For bounding boxes we average the standard deviation in each direction. BL denotes baseline performance. Figure 3: The airplane and motorbike datasets are aligned. Thus the image center baseline (b), (d) performs well there. Our localization performs similarly (a), (c). There is more variation in location in faces dataset. Scatterplot (f) shows the baseline performance and (g) shows the performance of our model. (e) shows the bounding boxes computed by our approach ( model). Object centers and bounding boxes are rectified using the ground truth bounding boxes (blue). No information about location or spatial extent of the object is given to the algorithm. Figure 4: Approaches like [3] do not use geometric constraints during learning. Therefore, correlation between background and object in the dataset is incorporated into the object model. In this case the ellipses represent the features that are used by the algorithm in [3] to decide the presence of a face and motorbike (left images taken from [3]). On the other hand, our model (right images) can estimate the location and support of the object, even though no information about it is provided during learning. Blue circles represent the features assigned by the model to the face, the red points are centers of features assigned to background (plot for Local Topic Model).

8 5 onclusions and future work We have presented a novel model for object categories. Our model allows efficient unsupervised learning, bringing the learning time to a few hours for full models and to minutes for variational approximations. The significant reduction in complexity allows to handle many more parts and features than comparable algorithms. The detection performance of our approach compares favorably to the state of the art even when compared to purely discriminative approaches. Also our model is capable of learning the spatial extent of the objects without supervision, with good results. This combination of fast learning and ability to localize is required to tackle challenging problems in computer vision. Among the most interesting applications we see unsupervised segmentation, learning, detection and localization of multiple object categories, deformable objects and objects with varying aspects. References [1] P. Viola and M. Jones. Rapid object detection using a boosted cascade of simple features. Proc. of VPR, pages , [2] G. surka,. Dance, L. Fan, and. Bray. Visual ategorization with Bags of Keypoints. In Workshop on Stat. Learning in omp. Vision, EV, pages 1 22, [3] G. Dorkó and. Schmid. Object class recognition using discriminative local features. Submitted to IEEE trans. on PAMI, [4] M. Weber, M. Welling, and P. Perona. Unsupervised Learning of Models for Recognition. Proc. of EV (1), pages 18 32, [5] R. Fergus, P. Perona, and A. Zisserman. Object lass Recognition by Unsupervised Scale- Invariant Learning. Proc. of VPR, pages , [6] S. Agarwal and D. Roth. Learning a sparse representation for object detection. In Proc. of EV, volume 4, pages , openhagen, Denmark, May [7] B. Leibe, A. Leonardis, and B. Schiele. ombined object categorization and segmentation with an implicit shape model. In Workshop on Stat. Learning in omp. Vision, pages 17 32, May [8] A. B. Hillel, T. Hertz, and D. Weinshall. Efficient learning of relational object class models. In Proc. of IV, pages , October [9] R. Fergus, P. Perona, and A. Zisserman. A sparse object category model for efficient learning and exhaustive recognition. In Proc. of VPR, pages , june [10] D. randall, P. Felzenszwalb, and D. Huttenlocher. Spatial Priors for Part-Based Recognition using Statistical Models. In Proc. of VPR, pages 10 17, [11] L. Fei-Fei, R. Fergus, and P. Perona. Learning generative visual models from few training examples an incremental bayesian approach tested on 101 object categories. In Workshop on Generative-Model Based Vision, Washington, D, June [12] A. Opelt, M. Fussenegger, A. Pinz, and P. Auer. Generic object recognition with boosting. Technical Report TR-EMT , EMT, TU Graz, Austria, Submitted to the IEEE Trans. on PAMI. [13] T. Kadir and M. Brady. Saliency, Scale and Image Description. IJV, 45(2):83 105, [14] B. Frey and N. Jojic. A omparison of Algorithms for Inference and Learning in Probabilistic Graphical Models. IEEE Trans. on PAMI, 27(9): , [15] R. Neal and G. Hinton. A view of the EM algorithm that justifies incremental, sparse, and other variants. In M. I. Jordan, editor, Learning in graphical models, pages MIT Press, ambridge, MA, USA, [16] D. Lowe. Distinctive image features from scale-invariant keypoints. IJV, 60(2):91 110, 2004.

Lecture 16: Object recognition: Part-based generative models

Lecture 16: Object recognition: Part-based generative models Lecture 16: Object recognition: Part-based generative models Professor Stanford Vision Lab 1 What we will learn today? Introduction Constellation model Weakly supervised training One-shot learning (Problem

More information

Part based models for recognition. Kristen Grauman

Part based models for recognition. Kristen Grauman Part based models for recognition Kristen Grauman UT Austin Limitations of window-based models Not all objects are box-shaped Assuming specific 2d view of object Local components themselves do not necessarily

More information

Selection of Scale-Invariant Parts for Object Class Recognition

Selection of Scale-Invariant Parts for Object Class Recognition Selection of Scale-Invariant Parts for Object Class Recognition Gy. Dorkó and C. Schmid INRIA Rhône-Alpes, GRAVIR-CNRS 655, av. de l Europe, 3833 Montbonnot, France fdorko,schmidg@inrialpes.fr Abstract

More information

Beyond bags of features: Adding spatial information. Many slides adapted from Fei-Fei Li, Rob Fergus, and Antonio Torralba

Beyond bags of features: Adding spatial information. Many slides adapted from Fei-Fei Li, Rob Fergus, and Antonio Torralba Beyond bags of features: Adding spatial information Many slides adapted from Fei-Fei Li, Rob Fergus, and Antonio Torralba Adding spatial information Forming vocabularies from pairs of nearby features doublets

More information

Fusing shape and appearance information for object category detection

Fusing shape and appearance information for object category detection 1 Fusing shape and appearance information for object category detection Andreas Opelt, Axel Pinz Graz University of Technology, Austria Andrew Zisserman Dept. of Engineering Science, University of Oxford,

More information

Previously. Part-based and local feature models for generic object recognition. Bag-of-words model 4/20/2011

Previously. Part-based and local feature models for generic object recognition. Bag-of-words model 4/20/2011 Previously Part-based and local feature models for generic object recognition Wed, April 20 UT-Austin Discriminative classifiers Boosting Nearest neighbors Support vector machines Useful for object recognition

More information

Improving Recognition through Object Sub-categorization

Improving Recognition through Object Sub-categorization Improving Recognition through Object Sub-categorization Al Mansur and Yoshinori Kuno Graduate School of Science and Engineering, Saitama University, 255 Shimo-Okubo, Sakura-ku, Saitama-shi, Saitama 338-8570,

More information

Efficient Kernels for Identifying Unbounded-Order Spatial Features

Efficient Kernels for Identifying Unbounded-Order Spatial Features Efficient Kernels for Identifying Unbounded-Order Spatial Features Yimeng Zhang Carnegie Mellon University yimengz@andrew.cmu.edu Tsuhan Chen Cornell University tsuhan@ece.cornell.edu Abstract Higher order

More information

Part-based and local feature models for generic object recognition

Part-based and local feature models for generic object recognition Part-based and local feature models for generic object recognition May 28 th, 2015 Yong Jae Lee UC Davis Announcements PS2 grades up on SmartSite PS2 stats: Mean: 80.15 Standard Dev: 22.77 Vote on piazza

More information

Part-based models. Lecture 10

Part-based models. Lecture 10 Part-based models Lecture 10 Overview Representation Location Appearance Generative interpretation Learning Distance transforms Other approaches using parts Felzenszwalb, Girshick, McAllester, Ramanan

More information

Object Recognition Using Pictorial Structures. Daniel Huttenlocher Computer Science Department. In This Talk. Object recognition in computer vision

Object Recognition Using Pictorial Structures. Daniel Huttenlocher Computer Science Department. In This Talk. Object recognition in computer vision Object Recognition Using Pictorial Structures Daniel Huttenlocher Computer Science Department Joint work with Pedro Felzenszwalb, MIT AI Lab In This Talk Object recognition in computer vision Brief definition

More information

Estimating Human Pose in Images. Navraj Singh December 11, 2009

Estimating Human Pose in Images. Navraj Singh December 11, 2009 Estimating Human Pose in Images Navraj Singh December 11, 2009 Introduction This project attempts to improve the performance of an existing method of estimating the pose of humans in still images. Tasks

More information

Structured Models in. Dan Huttenlocher. June 2010

Structured Models in. Dan Huttenlocher. June 2010 Structured Models in Computer Vision i Dan Huttenlocher June 2010 Structured Models Problems where output variables are mutually dependent or constrained E.g., spatial or temporal relations Such dependencies

More information

Beyond Bags of Features

Beyond Bags of Features : for Recognizing Natural Scene Categories Matching and Modeling Seminar Instructed by Prof. Haim J. Wolfson School of Computer Science Tel Aviv University December 9 th, 2015

More information

Using Geometric Blur for Point Correspondence

Using Geometric Blur for Point Correspondence 1 Using Geometric Blur for Point Correspondence Nisarg Vyas Electrical and Computer Engineering Department, Carnegie Mellon University, Pittsburgh, PA Abstract In computer vision applications, point correspondence

More information

Learning and Recognizing Visual Object Categories Without First Detecting Features

Learning and Recognizing Visual Object Categories Without First Detecting Features Learning and Recognizing Visual Object Categories Without First Detecting Features Daniel Huttenlocher 2007 Joint work with D. Crandall and P. Felzenszwalb Object Category Recognition Generic classes rather

More information

Image Processing. Image Features

Image Processing. Image Features Image Processing Image Features Preliminaries 2 What are Image Features? Anything. What they are used for? Some statements about image fragments (patches) recognition Search for similar patches matching

More information

Located Hidden Random Fields: Learning Discriminative Parts for Object Detection

Located Hidden Random Fields: Learning Discriminative Parts for Object Detection Located Hidden Random Fields: Learning Discriminative Parts for Object Detection Ashish Kapoor 1 and John Winn 2 1 MIT Media Laboratory, Cambridge, MA 02139, USA kapoor@media.mit.edu 2 Microsoft Research,

More information

Patch-Based Image Classification Using Image Epitomes

Patch-Based Image Classification Using Image Epitomes Patch-Based Image Classification Using Image Epitomes David Andrzejewski CS 766 - Final Project December 19, 2005 Abstract Automatic image classification has many practical applications, including photo

More information

Deformable Part Models

Deformable Part Models CS 1674: Intro to Computer Vision Deformable Part Models Prof. Adriana Kovashka University of Pittsburgh November 9, 2016 Today: Object category detection Window-based approaches: Last time: Viola-Jones

More information

Aggregating Descriptors with Local Gaussian Metrics

Aggregating Descriptors with Local Gaussian Metrics Aggregating Descriptors with Local Gaussian Metrics Hideki Nakayama Grad. School of Information Science and Technology The University of Tokyo Tokyo, JAPAN nakayama@ci.i.u-tokyo.ac.jp Abstract Recently,

More information

String distance for automatic image classification

String distance for automatic image classification String distance for automatic image classification Nguyen Hong Thinh*, Le Vu Ha*, Barat Cecile** and Ducottet Christophe** *University of Engineering and Technology, Vietnam National University of HaNoi,

More information

Statistical Shape Models for Object Recognition and Part Localization

Statistical Shape Models for Object Recognition and Part Localization Statistical Shape Models for Object Recognition and Part Localization Yan Li Yanghai Tsin Yakup Genc Takeo Kanade 1 ECE Department Real-time Vision and Modeling Carnegie Mellon University {yanli,tk}@cs.cmu.edu

More information

Part-Based Models for Object Class Recognition Part 2

Part-Based Models for Object Class Recognition Part 2 High Level Computer Vision Part-Based Models for Object Class Recognition Part 2 Bernt Schiele - schiele@mpi-inf.mpg.de Mario Fritz - mfritz@mpi-inf.mpg.de https://www.mpi-inf.mpg.de/hlcv Class of Object

More information

Part-Based Models for Object Class Recognition Part 2

Part-Based Models for Object Class Recognition Part 2 High Level Computer Vision Part-Based Models for Object Class Recognition Part 2 Bernt Schiele - schiele@mpi-inf.mpg.de Mario Fritz - mfritz@mpi-inf.mpg.de https://www.mpi-inf.mpg.de/hlcv Class of Object

More information

Object Category Detection: Sliding Windows

Object Category Detection: Sliding Windows 04/10/12 Object Category Detection: Sliding Windows Computer Vision CS 543 / ECE 549 University of Illinois Derek Hoiem Today s class: Object Category Detection Overview of object category detection Statistical

More information

Generic Object Recognition with Probabilistic Models and Weak Supervision

Generic Object Recognition with Probabilistic Models and Weak Supervision Generic Object Recognition with Probabilistic Models and Weak Supervision Yatharth Saraf Department of Computer Science and Engineering University of California, San Diego ysaraf@cs.ucsd.edu Abstract Real

More information

Loose Shape Model for Discriminative Learning of Object Categories

Loose Shape Model for Discriminative Learning of Object Categories Loose Shape Model for Discriminative Learning of Object Categories Margarita Osadchy and Elran Morash Computer Science Department University of Haifa Mount Carmel, Haifa 31905, Israel rita@cs.haifa.ac.il

More information

Object Class Recognition by Unsupervised Scale-Invariant Learning

Object Class Recognition by Unsupervised Scale-Invariant Learning Object Class Recognition by Unsupervised Scale-Invariant Learning R. Fergus P. Perona 2 A. Zisserman Dept. of Engineering Science 2 Dept. of Electrical Engineering University of Oxford California Institute

More information

Latent Variable Models and Expectation Maximization

Latent Variable Models and Expectation Maximization Latent Variable Models and Expectation Maximization Oliver Schulte - CMPT 726 Bishop PRML Ch. 9 2 4 6 8 1 12 14 16 18 2 4 6 8 1 12 14 16 18 5 1 15 2 25 5 1 15 2 25 2 4 6 8 1 12 14 2 4 6 8 1 12 14 5 1 15

More information

Object Class Recognition by Boosting a Part-Based Model

Object Class Recognition by Boosting a Part-Based Model In Proc. IEEE Conf. on Computer Vision & Pattern Recognition (CVPR), San-Diego CA, June 2005 Object Class Recognition by Boosting a Part-Based Model Aharon Bar-Hillel, Tomer Hertz and Daphna Weinshall

More information

Object detection using non-redundant local Binary Patterns

Object detection using non-redundant local Binary Patterns University of Wollongong Research Online Faculty of Informatics - Papers (Archive) Faculty of Engineering and Information Sciences 2010 Object detection using non-redundant local Binary Patterns Duc Thanh

More information

Object Category Detection. Slides mostly from Derek Hoiem

Object Category Detection. Slides mostly from Derek Hoiem Object Category Detection Slides mostly from Derek Hoiem Today s class: Object Category Detection Overview of object category detection Statistical template matching with sliding window Part-based Models

More information

A Sparse Object Category Model for Efficient Learning and Complete Recognition

A Sparse Object Category Model for Efficient Learning and Complete Recognition A Sparse Object Category Model for Efficient Learning and Complete Recognition Rob Fergus 1, Pietro Perona 2, and Andrew Zisserman 1 1 Dept. of Engineering Science University of Oxford Parks Road, Oxford

More information

Semi-supervised learning and recognition of object classes

Semi-supervised learning and recognition of object classes Semi-supervised learning and recognition of object classes R. Fergus, 2 P. Perona, and A. Zisserman Dept. of Engineering Science University of Oxford Parks Road, Oxford OX 3PJ, U.K. 2 Dept. of Electrical

More information

Spatial Latent Dirichlet Allocation

Spatial Latent Dirichlet Allocation Spatial Latent Dirichlet Allocation Xiaogang Wang and Eric Grimson Computer Science and Computer Science and Artificial Intelligence Lab Massachusetts Tnstitute of Technology, Cambridge, MA, 02139, USA

More information

ECE 5424: Introduction to Machine Learning

ECE 5424: Introduction to Machine Learning ECE 5424: Introduction to Machine Learning Topics: Unsupervised Learning: Kmeans, GMM, EM Readings: Barber 20.1-20.3 Stefan Lee Virginia Tech Tasks Supervised Learning x Classification y Discrete x Regression

More information

Conditional Random Fields for Object Recognition

Conditional Random Fields for Object Recognition Conditional Random Fields for Object Recognition Ariadna Quattoni Michael Collins Trevor Darrell MIT Computer Science and Artificial Intelligence Laboratory Cambridge, MA 02139 {ariadna, mcollins, trevor}@csail.mit.edu

More information

Part based people detection using 2D range data and images

Part based people detection using 2D range data and images Part based people detection using 2D range data and images Zoran Zivkovic and Ben Kröse Abstract This paper addresses the problem of people detection using 2D range data and omnidirectional vision. The

More information

Detecting Multiple Symmetries with Extended SIFT

Detecting Multiple Symmetries with Extended SIFT 1 Detecting Multiple Symmetries with Extended SIFT 2 3 Anonymous ACCV submission Paper ID 388 4 5 6 7 8 9 10 11 12 13 14 15 16 Abstract. This paper describes an effective method for detecting multiple

More information

Category-level localization

Category-level localization Category-level localization Cordelia Schmid Recognition Classification Object present/absent in an image Often presence of a significant amount of background clutter Localization / Detection Localize object

More information

Object and Class Recognition I:

Object and Class Recognition I: Object and Class Recognition I: Object Recognition Lectures 10 Sources ICCV 2005 short courses Li Fei-Fei (UIUC), Rob Fergus (Oxford-MIT), Antonio Torralba (MIT) http://people.csail.mit.edu/torralba/iccv2005

More information

Fitting: The Hough transform

Fitting: The Hough transform Fitting: The Hough transform Voting schemes Let each feature vote for all the models that are compatible with it Hopefully the noise features will not vote consistently for any single model Missing data

More information

Classifying Images with Visual/Textual Cues. By Steven Kappes and Yan Cao

Classifying Images with Visual/Textual Cues. By Steven Kappes and Yan Cao Classifying Images with Visual/Textual Cues By Steven Kappes and Yan Cao Motivation Image search Building large sets of classified images Robotics Background Object recognition is unsolved Deformable shaped

More information

Linear combinations of simple classifiers for the PASCAL challenge

Linear combinations of simple classifiers for the PASCAL challenge Linear combinations of simple classifiers for the PASCAL challenge Nik A. Melchior and David Lee 16 721 Advanced Perception The Robotics Institute Carnegie Mellon University Email: melchior@cmu.edu, dlee1@andrew.cmu.edu

More information

Local Features based Object Categories and Object Instances Recognition

Local Features based Object Categories and Object Instances Recognition Local Features based Object Categories and Object Instances Recognition Eric Nowak Ph.D. thesis defense 17th of March, 2008 1 Thesis in Computer Vision Computer vision is the science and technology of

More information

Detecting and Segmenting Humans in Crowded Scenes

Detecting and Segmenting Humans in Crowded Scenes Detecting and Segmenting Humans in Crowded Scenes Mikel D. Rodriguez University of Central Florida 4000 Central Florida Blvd Orlando, Florida, 32816 mikel@cs.ucf.edu Mubarak Shah University of Central

More information

Random projection for non-gaussian mixture models

Random projection for non-gaussian mixture models Random projection for non-gaussian mixture models Győző Gidófalvi Department of Computer Science and Engineering University of California, San Diego La Jolla, CA 92037 gyozo@cs.ucsd.edu Abstract Recently,

More information

Beyond Bags of features Spatial information & Shape models

Beyond Bags of features Spatial information & Shape models Beyond Bags of features Spatial information & Shape models Jana Kosecka Many slides adapted from S. Lazebnik, FeiFei Li, Rob Fergus, and Antonio Torralba Detection, recognition (so far )! Bags of features

More information

Strangeness Based Feature Selection for Part Based Recognition

Strangeness Based Feature Selection for Part Based Recognition Strangeness Based Feature Selection for Part Based Recognition Fayin Li and Jana Košecká and Harry Wechsler George Mason Univerity University Dr. Fairfax, VA 3 USA Abstract Motivated by recent approaches

More information

A Boundary-Fragment-Model for Object Detection

A Boundary-Fragment-Model for Object Detection A Boundary-Fragment-Model for Object Detection Andreas Opelt 1,AxelPinz 1, and Andrew Zisserman 2 1 Vision-based Measurement Group, Inst. of El. Measurement and Meas. Sign. Proc. Graz, University of Technology,

More information

Fitting: The Hough transform

Fitting: The Hough transform Fitting: The Hough transform Voting schemes Let each feature vote for all the models that are compatible with it Hopefully the noise features will not vote consistently for any single model Missing data

More information

GENERIC object recognition is a difficult computer vision problem. Although various solutions exist

GENERIC object recognition is a difficult computer vision problem. Although various solutions exist GRAZ UNIVERSITY OF TECHNOLOGY - TECHNICAL REPORT, TR-EMT-2004-01 (SUBMITTED TO IEEE[PAMI] 07/2004), OPELT ET AL. 1 Generic Object Recognition with Boosting Andreas Opelt, Michael Fussenegger, Axel Pinz,

More information

Basic Problem Addressed. The Approach I: Training. Main Idea. The Approach II: Testing. Why a set of vocabularies?

Basic Problem Addressed. The Approach I: Training. Main Idea. The Approach II: Testing. Why a set of vocabularies? Visual Categorization With Bags of Keypoints. ECCV,. G. Csurka, C. Bray, C. Dance, and L. Fan. Shilpa Gulati //7 Basic Problem Addressed Find a method for Generic Visual Categorization Visual Categorization:

More information

CS 231A Computer Vision (Fall 2011) Problem Set 4

CS 231A Computer Vision (Fall 2011) Problem Set 4 CS 231A Computer Vision (Fall 2011) Problem Set 4 Due: Nov. 30 th, 2011 (9:30am) 1 Part-based models for Object Recognition (50 points) One approach to object recognition is to use a deformable part-based

More information

Object Class Recognition Using Multiple Layer Boosting with Heterogeneous Features

Object Class Recognition Using Multiple Layer Boosting with Heterogeneous Features Object Class Recognition Using Multiple Layer Boosting with Heterogeneous Features Wei Zhang 1 Bing Yu 1 Gregory J. Zelinsky 2 Dimitris Samaras 1 Dept. of Computer Science SUNY at Stony Brook Stony Brook,

More information

Discovering Visual Hierarchy through Unsupervised Learning Haider Razvi

Discovering Visual Hierarchy through Unsupervised Learning Haider Razvi Discovering Visual Hierarchy through Unsupervised Learning Haider Razvi hrazvi@stanford.edu 1 Introduction: We present a method for discovering visual hierarchy in a set of images. Automatically grouping

More information

Semi-Supervised Hierarchical Models for 3D Human Pose Reconstruction

Semi-Supervised Hierarchical Models for 3D Human Pose Reconstruction Semi-Supervised Hierarchical Models for 3D Human Pose Reconstruction Atul Kanaujia, CBIM, Rutgers Cristian Sminchisescu, TTI-C Dimitris Metaxas,CBIM, Rutgers 3D Human Pose Inference Difficulties Towards

More information

Selection of Scale-Invariant Parts for Object Class Recognition

Selection of Scale-Invariant Parts for Object Class Recognition Selection of Scale-Invariant Parts for Object Class Recognition Gyuri Dorkó, Cordelia Schmid To cite this version: Gyuri Dorkó, Cordelia Schmid. Selection of Scale-Invariant Parts for Object Class Recognition.

More information

A Hierarchical Compositional System for Rapid Object Detection

A Hierarchical Compositional System for Rapid Object Detection A Hierarchical Compositional System for Rapid Object Detection Long Zhu and Alan Yuille Department of Statistics University of California at Los Angeles Los Angeles, CA 90095 {lzhu,yuille}@stat.ucla.edu

More information

Face detection and recognition. Detection Recognition Sally

Face detection and recognition. Detection Recognition Sally Face detection and recognition Detection Recognition Sally Face detection & recognition Viola & Jones detector Available in open CV Face recognition Eigenfaces for face recognition Metric learning identification

More information

Preliminary Local Feature Selection by Support Vector Machine for Bag of Features

Preliminary Local Feature Selection by Support Vector Machine for Bag of Features Preliminary Local Feature Selection by Support Vector Machine for Bag of Features Tetsu Matsukawa Koji Suzuki Takio Kurita :University of Tsukuba :National Institute of Advanced Industrial Science and

More information

Probabilistic Location Recognition using Reduced Feature Set

Probabilistic Location Recognition using Reduced Feature Set Probabilistic Location Recognition using Reduced Feature Set Fayin Li and Jana Košecá Department of Computer Science George Mason University, Fairfax, VA 3 Email: {fli,oseca}@cs.gmu.edu Abstract The localization

More information

GENERIC object recognition is a difficult computer vision problem. Although various solutions exist

GENERIC object recognition is a difficult computer vision problem. Although various solutions exist GRAZ UNIVERSITY OF TECHNOLOGY - TECHNICAL REPORT, TR-EMT-2004-01 (SUBMITTED TO IEEE[PAMI] 07/2004), OPELT ET AL. 1 Generic Object Recognition with Boosting Andreas Opelt, Michael Fussenegger, Axel Pinz,

More information

CS 231A Computer Vision (Fall 2012) Problem Set 3

CS 231A Computer Vision (Fall 2012) Problem Set 3 CS 231A Computer Vision (Fall 2012) Problem Set 3 Due: Nov. 13 th, 2012 (2:15pm) 1 Probabilistic Recursion for Tracking (20 points) In this problem you will derive a method for tracking a point of interest

More information

The Distinctiveness, Detectability, and Robustness of Local Image Features

The Distinctiveness, Detectability, and Robustness of Local Image Features The istinctiveness, etectability, and Robustness of Local Image Features Gustavo arneiro epartment of omputer Science University of British olumbia Vancouver, B, anada an Jepson epartment of omputer Science

More information

FACE DETECTION AND RECOGNITION OF DRAWN CHARACTERS HERMAN CHAU

FACE DETECTION AND RECOGNITION OF DRAWN CHARACTERS HERMAN CHAU FACE DETECTION AND RECOGNITION OF DRAWN CHARACTERS HERMAN CHAU 1. Introduction Face detection of human beings has garnered a lot of interest and research in recent years. There are quite a few relatively

More information

Lecture 8 Fitting and Matching

Lecture 8 Fitting and Matching Lecture 8 Fitting and Matching Problem formulation Least square methods RANSAC Hough transforms Multi-model fitting Fitting helps matching! Reading: [HZ] Chapter: 4 Estimation 2D projective transformation

More information

Learning an Alphabet of Shape and Appearance for Multi-Class Object Detection

Learning an Alphabet of Shape and Appearance for Multi-Class Object Detection Learning an Alphabet of Shape and Appearance for Multi-Class Object Detection Andreas Opelt, Axel Pinz and Andrew Zisserman 09-June-2009 Irshad Ali (Department of CS, AIT) 09-June-2009 1 / 20 Object class

More information

Effective Classifiers for Detecting Objects

Effective Classifiers for Detecting Objects Effective Classifiers for Detecting Objects Michael Mayo Dept. of Computer Science University of Waikato Private Bag 3105, Hamilton, New Zealand mmayo@cs.waikato.ac.nz Abstract Several state-of-the-art

More information

CS 223B Computer Vision Problem Set 3

CS 223B Computer Vision Problem Set 3 CS 223B Computer Vision Problem Set 3 Due: Feb. 22 nd, 2011 1 Probabilistic Recursion for Tracking In this problem you will derive a method for tracking a point of interest through a sequence of images.

More information

Object Recognition via Local Patch Labelling

Object Recognition via Local Patch Labelling Object Recognition via Local Patch Labelling Christopher M. Bishop 1 and Ilkay Ulusoy 2 1 Microsoft Research, 7 J J Thompson Avenue, Cambridge, U.K. http://research.microsoft.com/ cmbishop 2 METU, Computer

More information

Object Detection Design challenges

Object Detection Design challenges Object Detection Design challenges How to efficiently search for likely objects Even simple models require searching hundreds of thousands of positions and scales Feature design and scoring How should

More information

Using the Deformable Part Model with Autoencoded Feature Descriptors for Object Detection

Using the Deformable Part Model with Autoencoded Feature Descriptors for Object Detection Using the Deformable Part Model with Autoencoded Feature Descriptors for Object Detection Hyunghoon Cho and David Wu December 10, 2010 1 Introduction Given its performance in recent years' PASCAL Visual

More information

Automatic Learning and Extraction of Multi-Local Features

Automatic Learning and Extraction of Multi-Local Features Automatic Learning and Extraction of Multi-Local Features Oscar Danielsson, Stefan Carlsson and Josephine Sullivan School of Computer Science and Communications Royal Inst. of Technology, Stockholm, Sweden

More information

Efficient Learning of Relational Object Class Models

Efficient Learning of Relational Object Class Models In Proc. 10th IEEE International onference of omputer Vision, beijing, October 2005. Efficient Learning of Relational Object lass Models Aharon Bar Hillel Tomer Hertz Daphna Weinshall School of omputer

More information

Part-Based Models for Object Class Recognition Part 3

Part-Based Models for Object Class Recognition Part 3 High Level Computer Vision! Part-Based Models for Object Class Recognition Part 3 Bernt Schiele - schiele@mpi-inf.mpg.de Mario Fritz - mfritz@mpi-inf.mpg.de! http://www.d2.mpi-inf.mpg.de/cv ! State-of-the-Art

More information

Slightly Supervised Learning of Part-Based Appearance Models

Slightly Supervised Learning of Part-Based Appearance Models Slightly Supervised Learning of Part-Based Appearance Models Lexing Xie Dept. of Electrical Engineering Columbia University New York, NY 27 Patrick Pérez Irisa/Inria Campus de Beaulieu F-3542 Rennes Cedex

More information

10/03/11. Model Fitting. Computer Vision CS 143, Brown. James Hays. Slides from Silvio Savarese, Svetlana Lazebnik, and Derek Hoiem

10/03/11. Model Fitting. Computer Vision CS 143, Brown. James Hays. Slides from Silvio Savarese, Svetlana Lazebnik, and Derek Hoiem 10/03/11 Model Fitting Computer Vision CS 143, Brown James Hays Slides from Silvio Savarese, Svetlana Lazebnik, and Derek Hoiem Fitting: find the parameters of a model that best fit the data Alignment:

More information

Machine Learning. Unsupervised Learning. Manfred Huber

Machine Learning. Unsupervised Learning. Manfred Huber Machine Learning Unsupervised Learning Manfred Huber 2015 1 Unsupervised Learning In supervised learning the training data provides desired target output for learning In unsupervised learning the training

More information

CEE598 - Visual Sensing for Civil Infrastructure Eng. & Mgmt.

CEE598 - Visual Sensing for Civil Infrastructure Eng. & Mgmt. CEE598 - Visual Sensing for Civil Infrastructure Eng. & Mgmt. Section 10 - Detectors part II Descriptors Mani Golparvar-Fard Department of Civil and Environmental Engineering 3129D, Newmark Civil Engineering

More information

Combining PGMs and Discriminative Models for Upper Body Pose Detection

Combining PGMs and Discriminative Models for Upper Body Pose Detection Combining PGMs and Discriminative Models for Upper Body Pose Detection Gedas Bertasius May 30, 2014 1 Introduction In this project, I utilized probabilistic graphical models together with discriminative

More information

Determinant of homography-matrix-based multiple-object recognition

Determinant of homography-matrix-based multiple-object recognition Determinant of homography-matrix-based multiple-object recognition 1 Nagachetan Bangalore, Madhu Kiran, Anil Suryaprakash Visio Ingenii Limited F2-F3 Maxet House Liverpool Road Luton, LU1 1RS United Kingdom

More information

Generic Object Detection Using Improved Gentleboost Classifier

Generic Object Detection Using Improved Gentleboost Classifier Available online at www.sciencedirect.com Physics Procedia 25 (2012 ) 1528 1535 2012 International Conference on Solid State Devices and Materials Science Generic Object Detection Using Improved Gentleboost

More information

High Level Computer Vision

High Level Computer Vision High Level Computer Vision Part-Based Models for Object Class Recognition Part 2 Bernt Schiele - schiele@mpi-inf.mpg.de Mario Fritz - mfritz@mpi-inf.mpg.de http://www.d2.mpi-inf.mpg.de/cv Please Note No

More information

REAL-TIME, LONG-TERM HAND TRACKING WITH UNSUPERVISED INITIALIZATION

REAL-TIME, LONG-TERM HAND TRACKING WITH UNSUPERVISED INITIALIZATION REAL-TIME, LONG-TERM HAND TRACKING WITH UNSUPERVISED INITIALIZATION Vincent Spruyt 1,2, Alessandro Ledda 1 and Wilfried Philips 2 1 Dept. of Applied Engineering: Electronics-ICT, Artesis University College

More information

Modeling 3D viewpoint for part-based object recognition of rigid objects

Modeling 3D viewpoint for part-based object recognition of rigid objects Modeling 3D viewpoint for part-based object recognition of rigid objects Joshua Schwartz Department of Computer Science Cornell University jdvs@cs.cornell.edu Abstract Part-based object models based on

More information

Understanding Clustering Supervising the unsupervised

Understanding Clustering Supervising the unsupervised Understanding Clustering Supervising the unsupervised Janu Verma IBM T.J. Watson Research Center, New York http://jverma.github.io/ jverma@us.ibm.com @januverma Clustering Grouping together similar data

More information

An Efficient Model Selection for Gaussian Mixture Model in a Bayesian Framework

An Efficient Model Selection for Gaussian Mixture Model in a Bayesian Framework IEEE SIGNAL PROCESSING LETTERS, VOL. XX, NO. XX, XXX 23 An Efficient Model Selection for Gaussian Mixture Model in a Bayesian Framework Ji Won Yoon arxiv:37.99v [cs.lg] 3 Jul 23 Abstract In order to cluster

More information

Video Google faces. Josef Sivic, Mark Everingham, Andrew Zisserman. Visual Geometry Group University of Oxford

Video Google faces. Josef Sivic, Mark Everingham, Andrew Zisserman. Visual Geometry Group University of Oxford Video Google faces Josef Sivic, Mark Everingham, Andrew Zisserman Visual Geometry Group University of Oxford The objective Retrieve all shots in a video, e.g. a feature length film, containing a particular

More information

Discriminative Patch Selection using Combinatorial and Statistical Models for Patch-Based Object Recognition

Discriminative Patch Selection using Combinatorial and Statistical Models for Patch-Based Object Recognition Discriminative Patch Selection using Combinatorial and Statistical Models for Patch-Based Object Recognition Akshay Vashist 1, Zhipeng Zhao 1, Ahmed Elgammal 1, Ilya Muchnik 1,2, Casimir Kulikowski 1 1

More information

Fitting: The Hough transform

Fitting: The Hough transform Fitting: The Hough transform Voting schemes Let each feature vote for all the models that are compatible with it Hopefully the noise features will not vote consistently for any single model Missing data

More information

Modern Object Detection. Most slides from Ali Farhadi

Modern Object Detection. Most slides from Ali Farhadi Modern Object Detection Most slides from Ali Farhadi Comparison of Classifiers assuming x in {0 1} Learning Objective Training Inference Naïve Bayes maximize j i logp + logp ( x y ; θ ) ( y ; θ ) i ij

More information

Tensor Decomposition of Dense SIFT Descriptors in Object Recognition

Tensor Decomposition of Dense SIFT Descriptors in Object Recognition Tensor Decomposition of Dense SIFT Descriptors in Object Recognition Tan Vo 1 and Dat Tran 1 and Wanli Ma 1 1- Faculty of Education, Science, Technology and Mathematics University of Canberra, Australia

More information

Comparative Feature Extraction and Analysis for Abandoned Object Classification

Comparative Feature Extraction and Analysis for Abandoned Object Classification Comparative Feature Extraction and Analysis for Abandoned Object Classification Ahmed Fawzi Otoom, Hatice Gunes, Massimo Piccardi Faculty of Information Technology University of Technology, Sydney (UTS)

More information

Scene Grammars, Factor Graphs, and Belief Propagation

Scene Grammars, Factor Graphs, and Belief Propagation Scene Grammars, Factor Graphs, and Belief Propagation Pedro Felzenszwalb Brown University Joint work with Jeroen Chua Probabilistic Scene Grammars General purpose framework for image understanding and

More information

Detection III: Analyzing and Debugging Detection Methods

Detection III: Analyzing and Debugging Detection Methods CS 1699: Intro to Computer Vision Detection III: Analyzing and Debugging Detection Methods Prof. Adriana Kovashka University of Pittsburgh November 17, 2015 Today Review: Deformable part models How can

More information

Joint design of data analysis algorithms and user interface for video applications

Joint design of data analysis algorithms and user interface for video applications Joint design of data analysis algorithms and user interface for video applications Nebojsa Jojic Microsoft Research Sumit Basu Microsoft Research Nemanja Petrovic University of Illinois Brendan Frey University

More information

Generic Face Alignment Using an Improved Active Shape Model

Generic Face Alignment Using an Improved Active Shape Model Generic Face Alignment Using an Improved Active Shape Model Liting Wang, Xiaoqing Ding, Chi Fang Electronic Engineering Department, Tsinghua University, Beijing, China {wanglt, dxq, fangchi} @ocrserv.ee.tsinghua.edu.cn

More information

Tracking. Hao Guan( 管皓 ) School of Computer Science Fudan University

Tracking. Hao Guan( 管皓 ) School of Computer Science Fudan University Tracking Hao Guan( 管皓 ) School of Computer Science Fudan University 2014-09-29 Multimedia Video Audio Use your eyes Video Tracking Use your ears Audio Tracking Tracking Video Tracking Definition Given

More information