Multiple Pose Context Trees for estimating Human Pose in Object Context

Size: px
Start display at page:

Download "Multiple Pose Context Trees for estimating Human Pose in Object Context"

Transcription

1 Multiple Pose Context Trees for estimating Human Pose in Object Context Vivek Kumar Singh Furqan Muhammad Khan Ram Nevatia University of Southern California Los Angeles, CA {viveksin furqankh Abstract We address the problem of estimating pose in a static image of a human performing an action that may involve interaction with scene objects. In such scenarios, pose can be estimated more accurately using the knowledge of scene objects. Previous approaches do not make use of such contextual information. We propose Pose Context trees to jointly model human pose and object which allows both accurate and efficient inference when the nature of interaction is known. To estimate the pose in an image, we present a Bayesian framework that infers the optimal pose-object pair by maximizing the likelihood over multiple pose context trees for all interactions. We evaluate our approach on a dataset of 65 images, and show that the joint inference of pose and context gives higher pose accuracy. 1. Introduction We consider the problem of estimating 2D human pose in static images where the human is performing an action that involves interaction with scene objects. For example, a person interacts with the soccer ball with his/her leg while dribbling or kicking the ball. In such case, when the partobject interaction is known, the object position can be used to improve the pose estimation. For instance, if we know the position of the soccer ball or the leg, it can used to improve the estimation of other (see figure 1). However, to determine the interaction in an image, pose and object estimates are themselves needed. We propose a framework that simultaneously estimates the human pose and determines the nature of human object interaction. Note that the primary objective of this work is to estimate the human pose, however, in order to improve the pose estimate using object context we also determine the interaction. Numerous approaches have been developed for estimating human pose in static images [4, 10, 18, 16, 14] but these do not use any contextual information from the scene. Multiple attempts have been made to recognize human pose and/or action, both with and without using scene context (d) (e) Figure 1. Effect of object context on human pose estimation., (c) show sample images of players playing soccer and basketball respectively;, (e) shows the human pose estimation using treestructured models [14]; (c), (f) show the estimated human pose using the object context [6, 9, 8, 17]. These approaches first estimate the human pose and then use the estimated pose as input to determine the action. Although in recent years, general human pose estimation have seen significant advances especially using part-based models [4], these approaches still produce ambiguous results that are not accurate enough to recognize human actions. [5] used a part based model [14] to obtain pose hypotheses and used these hypotheses to obtain descriptors to query poses in unseen images. [8] usedthe upper body estimates obtained using [4] to obtain hand trajectories, and used these trajectories to simultaneously detect objects and recognize human object interaction. [17] discovers action classes using shape of humans described by shape context histograms. Recently, attempts have also been made to classify image scenes by jointly inferring over multiple object classes [13, 3] such as co-occurrence of human and racket for recognizing tennis [13]. [3]alsousedthe relative spatial context of the objects to improve the object detection. (c) (f)

2 Previous approaches that model human-object interaction, either use a coarse estimate of the human to model interaction [9, 3] or estimate the human pose as pre-step to simultaneous object and action classification [8]. In this work, we use an articulate human pose model with 10 body parts which allows accurate modeling of human and object interaction. More precisely, we propose a graphical model, Pose Context Tree to simultaneously estimate the human pose and the object. The model is obtained by adding an object node to the tree-structured part model for human pose [4, 10, 18, 16] such that the resulting structure is still a tree, and thus allows efficient and exact inference [4]. To automatically determine the interaction, we consider multiple pose context trees for each possible human-object interaction based on which part may interact with the object. The best pose is inferred as the pose that correspond to the maximum likelihood score over the set of all trees. We also consider the probability of absence of an object which allows us to determine if the image does not contain any of the known interactions. Thus, our contribution is two-fold, Pose context trees to jointly estimate detailed human pose and object which allows for accurate interaction model A Bayesian framework to jointly infer human pose and human-object interaction in a single image; when the interaction in the image is not modeled, then our algorithm reports unknown interaction and estimate the human pose without any contextual information. To evaluate our approach, we collected images from the Internet and previously released datasets [2, 14, 9]. Our dataset has 65 images, out of which 44 has a person either kicking or dribbling a soccerball, or holding a basketball. We demonstrate that our approach improves the pose accuracy over the dataset by 5% for all parts and by about 8% for parts that are involved in the interaction, for example, legs for soccer. In the rest of the paper, we first discuss the pose context tree in section 2 and the algorithm for simultaneous pose estimation and action recognition in section 3. Next, we present the model learning in section 4, followed by the experiments in section 5 and conclusion in section Modeling Human Body and Object Interaction We use a tree structured model to represent the interaction between the human pose and a manipulable object involved in an action. We refer to this as Pose Context Tree. Our model is based on the observation that during an activity that involves human object interaction such as playing tennis or basketball, humans interact with the part extremities i.e. hands, lower legs and head; for example, basketball players hold the ball in their hand while shooting. We first describe a tree structured model of human body, and then demonstrate how we extend it to a pose context tree that can be used to simultaneously infer the pose and the object. [Tree-structured Human Body Model] We represent the human pose using a tree pictorial structure [4] with torso at the root and the four limbs and head as its branches (see figure 2). The human body X is denoted as a joint configuration of body parts {x i }, where x i = (p i,θ i ) encodes the position and orientation of part i. Given the full object X, the model assumes the likelihood maps for parts are conditionally independent and are kinematically constrained using model parameters Θ. Under this assumption, the posterior likelihood of the object X given the image observations Y is P (X Y,Θ) P (X Θ) P (Y X, Θ) (1) = P (X Θ) P (Y x i ) i exp φ i (Y x i ) ij E ψ ij (x i,x j Θ) + i where (V,E) is the graphical model; φ i (.) is the likelihood potential for part i; ψ ij () is the kinematic prior potential between parts i and j modeled using Θ. For efficiency, priors are assumed to be Gaussian [4]. [Pose Context Tree] Pose context tree models the interaction between the human body and an object involved. Since humans often interact with scene objects with part extremities (legs, hands), pose context trees are obtained by adding an object node to a leaf node in the human tree model (see figure 2). We represent the pose and object jointly using X o = {x i } z o, where z o is the position of the object O. P (X o Y,Θ) P (X o Θ) P (Y X o, Θ) (2) ( ) = P (X o Θ) P (Y x i ) P (Y z o ) where P (X o Θ) is the joint prior for the body parts and the object O. Since the object and interaction are known, we assume knowledge of the body part involved in the interaction with the object. As the graphical model with the context node is a tree, the joint kinematic prior P (X o Θ) can be written as P (X Θ P ) P (x k,z o Θ a ), where k is the part interacting with the object, Θ P is kinematic prior for body parts, Θ a is the spatial prior for interaction model a between z o and x k. Thus, the joint likelihood can be now i

3 written as P (X o Y,Θ) P (X Θ P )P (x o x k, Θ a ) ( ) P (Y x i ) P (Y z o ) (3) i P (X Y,Θ P ) exp (ψ a (x k,z o Θ a )+φ o (Y z o )) where, φ o (.) is the likelihood potential of the object O, ψ a () is the object-pose interaction potential between O and interacting body part k for interaction model a (given by equation 4). ψ a (x k,z o )={ 1 Tko (x k ) T ok (z o ) <d a ko 0 otherwise where T ko (.) is the relative position of the point of interaction between O and part k in the coordinate frame of the object O. x lla x lua x lll x lul x h x t x rul x rua x rll x rla Figure 2. Pose Model: Tree structured human model with observation nodes Pose Context Tree for object interaction with left lower leg; object node is marked as double-lined x x lla o x lua x lll x lul xh x t 3. Human Pose Estimation using Context We use Bayesian formulation to jointly infer the human pose and the part-object interaction in a single image. Here, by inferring part-object interaction we mean estimating the object position and the interacting body part. Note that unlike [9], our model is generative and does not assume that the set of interactions forms a closed set i.e. we consider the set of interactions A = {a i } φ. The joint optimal pose configuration and interaction pair (X,a ) is defined as x rul x rua x rll (4) (X,a ) = arg max P (X, a Y,Θ) = arg max P (a X, Y, Θ)P (X Y,Θ) (5) We define the conditional likelihood of interaction a given the pose estimate X, observations Y and model parameters Θ as the product of likelihoods of the corresponding object xrla O in the neighborhood of X and absence of objects that correspond to other interactions in A, i.e. P (a X, Y, Θ) P (z o X, Y, Θ) ( 1 P (zo(a ) X, Y, Θ) ) (6) a A\{a} Combining equations 5 and 6, we can obtain joint optimal pose-action pair as (X,a )=arg max P (z o X, Y, Θ) P (X Y,Θ) ( 1 P (zo(a ) X, Y, Θ) ) a A\{a} = arg max P (X o Y,Θ) ( 1 P (zo(a ) X, Y, Θ) ) (7) a A\{a} The joint pose-interaction pair likelihood given in equation 7 can be represented as a graphical model, however the graph in such case will have cycles because of edges from object nodes to the left and right body parts. One may use loopy belief propagation to jointly infer over all interactions [9] but in this work, we use an alternate approach by efficiently and accurately solving for each interaction independently and then selecting the best pose-interaction pair. For each interaction a, we estimate the best pose Xo and then add penalties for other objects present close to Xo.The optimal pose-interaction pair is then given by (X,a )=arg max a A\{a} (( max X o P (X o Y,Θ) ) ( ) 1 max P (z o(a z ) Xo,Y,Θ) (8) o(a ) where Xo can be obtained by solving the Pose Context Tree for the corresponding interaction a (described later in this section). Observe that when a = φ, the problem reduces to finding the best pose given the observation and adds a penalty if objects are found near the inferred pose. Thus our model can be applied on any image even when the interaction in the image is not modeled, thereby making our model more apt for estimating human poses in general scenarios Pose Inference for known Object Context using Pose Context Tree Given the object context i.e. object position and interaction model, pose is inferred using pose context tree by maximizing the joint likelihood given by equation 2. Since the corresponding energy equation for pose context tree (eqn 2) has a similar form as that of tree structured human

4 body model (eqn 1), both can be minimized using similar algorithms [4, 14, 5, 1]. These approaches apply part/object detectors over the all image positions and orientations to obtain part hypotheses and then enforce kinematic constraints on these hypotheses using belief propagation [11] over the graphical model. This is sometimes referred to as parsing. Given an image parse of the parts, the best pose is obtained from part hypotheses by sampling methods such as importance sampling [4], maximum likelihood [1], data-driven MCMC [12]. [Body Part and Object Detectors] We used the boundary and region templates trained by Ramanan et al [14] for localizing human pose (see 3(a, b)). Each template is a weighted sum of the oriented bar filters where the weights are obtained by maximizing the conditional joint likelihood (refer [15] for details on training). The likelihood of a part is obtained by convolving the part boundary template with the Sobel edge map, and the part region template with part s appearance likelihood map. Since the appearance of parts is not known at the start, part estimates inferred using boundary templates are used to build the part appearance models (see iterative parsing [14]). For each part, an RGB histogram of the part h fg and its background h bg is learnt; the appearance likelihood map for the part is then simply given by the binary map p(h fg c) >p(h bg c). For more details, please refer to [14]. For each object class such as soccer ball, we trained a separate detector with a variant of Histogram of Gradients features [2], the mean RGB and HSV values, and the normalized Hue and Saturation Histograms. The detectors were trained using Gentle AdaBoost [7]. We use a sliding window approach to detect objects in the image; a window is tested for presence of object by extracting the image features and running them through boosted decision trees learned from training examples. The details on learning object detectors are described in Section 4.2. [Infer Part and Object Distributions] For each part and object, we apply the detector over all image positions and orientations to obtain a dense distribution over the entire configuration space. We then simultaneously compute the posterior distribution of all parts by locally exchanging messages about kinematic information between parts that are connected. More precisely, the message from part i to part j is the distribution of the joint connecting parts i and j, based on the observation at part i. This distribution is efficiently obtained by transforming the part distribution into the coordinate system of the connecting joint and applying a zero mean Gaussian whose variance determines the stiffness between the parts [4]. [Selecting the Best Pose and Object] Since the tree structure model does not represent inter part occlusion between the parts that are not connected, the pose obtained by assembling maximum posterior estimates for each part [1] does not result in a kinematically consistent pose. Thus, we use a top down approach for obtaining a pose by finding the maximum likelihood torso estimate first (root) and then finding the child part given the parent estimate. This ensures a kinematically consistent pose. (c) (d) Figure 4. Inference on Pose Context Tree: Sample image of a soccer player Distributions obtained after applying edge templates (c) Joint part distributions of edge and region templates (d) Inferred pose and object Figure 3. Part Templates: Edge based part templates Region based part templates, dark areas correspond to low probability of edge, and bright areas correspond to a high probability; 4. Model Learning The model learning includes learning the potential functions in the pose context tree i.e. the body part and the object detectors for computing the likelihood potential, and

5 the prior potentials for the Pose Context tree. For the body part detectors, we use templates provided by Ramanan et al [14] (for learning these templates, please refer to [15]) Prior potentials for Pose Context Tree Model parameters include the kinematic functions between the parts ψ ij s and the spatial context for each manipulable object O, ψ ko. [Human Body Kinematic Prior]: The kinematic function is modeled with Gaussians, i.e. position of the connecting joint in a coordinate system of both parts (m ij,σ ij ) and (m ji,σ ji ) and the relative angles of the parts at the connected joint (m ij θ,σij θ ). Given the joint annotations that is available from the training data, we learn the Gaussian parameters with a Maximum Likelihood Estimator [4], [1]. [Pose-Object Spatial Prior]: The spatial function is modeled as a binary potential with a box prior (eqn 4). The box prior ψ ko is parameterized as mean and variance (m, σ), which spans the region [m 1 2 σ, m σ]. Giventhe pose and object annotations, we learn these parameters from the training data Object Detector For each type of object, we train a separate detector for its class using Gentle AdaBoost [7]. We use a variation of Histogram of Gradients [2] to model edge boundary distribution and mean RGB and HSV values for color. For efficiency, we do not perform all the steps suggested by [2] for computing HOGs. We divide the image in a rectangular grid of patches and for each cell in the grid, a histogram of gradients is constructed over orientation bins. Each pixel in the cell cast a vote equal to its magnitude to the bin that corresponds to its orientation. The histograms are then sumnormalized to 1.0. For appearance model of the objects, normalized histograms over hue and saturation values are constructed for each cell. Thus our descriptor for each cell consists of mean RGB and HSV values, and normalized histograms of oriented gradients, hue and saturation. For training, we use images downloaded from Internet for each class. We collected 50 positive samples for each class and 3500 negative samples obtained by random sampling windows of fixed size from the images. During training detector for one class, positive examples from other classes were also added to the negative set. For robustness, we increased the positive sample set by including small affine perturbations the positive samples (rotation and scale). For each object detector, the detection parameters include number of horizontal and vertical partitions of the window, number of histogram bins for gradients, hue and saturation, number of boosted trees and their depths for the classifier, and were selected based on the performance of the detector on the validation set. The validation set contained 15 images for each object class and 15 images containing none of them and does not overlap with the training or test sets. We select the classifier that gives lowest False Positive Rate. 5. Experiments To validate the model, we created a dataset with images downloaded from the Internet and other datasets [9, 2, 14]. The dataset has images for 3 interactions - legs with soccer ball, hands with basketball and miscellaneous. For ease of writing, we refer to these as soccer, basketball and misc respectively. The soccer set includes images of players kicking or dribbling the ball, basketball set has images of players shooting or holding the ball, and the misc set includes images from People dataset [14] and INRIA pedestrian dataset [2]. Similar to People dataset [14], we resize each image such that person is roughly 100 pixels high. Figure 5 show some sample images from our dataset. Note that we do not evaluate our algorithm on the existing dataset [9] as our system assumes that the entire person is within the image Object Detection We evaluate the object detection for each object type i.e. soccer ball and basketball. Figure 6 show some positive examples from the training set. For evaluation, we consider a detection hypothesis to be correct, if detection bounding box overlaps the ground truth for the same class by more than 50%. We first desribe the detection parameters used in our experiments, and then evaluate the detectors on the test set. [Detection Parameters]: As mentioned previously in the Learning section, we set the detection parameters for each object based on its performance on the validation set. We select the detection parameters by experimenting over thegridsize(1 1, 3 3, 5 5), number of histogram bins for gradients (8, 10, 12 over 180 degrees), hue (8, 10, 12) and saturation (4, 5, 6), number of boosted trees (100, 150, 200) and their depths (1, 2) for the classifier, and threshold on detection confidence (0.2, 0.3,..., 0.9). We use training window size of for both soccer and basketball. For soccer ball and basketball, we select the detection parameter settings that gives lowest False Positive Rate with miss rate of at most 20%. For soccer ball, the detector trained with 12 gradient orientation, 10 hue and 4 saturation bins over a 5 5 grid gave the lowest, , FPPW for boosting over 150 trees of depth. On the other hand, for basketball, 200 boosted trees of depth 2 on a 5 5 grid gave lowest FPPW of for 12 HOG, 8 hue, 6 saturation bins. [Evaluation]: We evaluate the detectors by applying

6 threshold of 0.5. Detection Rate False Positives PW Soccer ball 91.7% Basketball 84.2% Table 1. Object detection accuracy over the Test set Figure 5. Sample images from the dataset; Rows 1, 2 contains examples from basketball class; Rows 3, 4 from soccer, and Row 5 from misc Pose Estimation For evaluation, we compute the pose estimation accuracy over the entire dataset with and without using the object context. Pose accuracy is computed as the average correctness of each body part over all the images (total of 650 parts). An estimated body part is considered correct if its segment endpoints lie within 50% of the length of the ground-truth segment from their annotated location, as in earlier reported results [5, 1]. To demonstrate the effectiveness of each module in our algorithm, we compute pose accuracy using 3 approaches: Method A, which estimates human pose using [14] without using any contextual information; Method B, which estimate the human pose with known object i.e. using pose context tree; Method C, which jointly infers the object and estimate the human pose in the image. Note that Method B essentially correspond to the case when all interactions are correctly recognized and the joint pose and object estimation is done using pose context trees; hence performance of Method B gives an upper bound on the performance of Method C which is the fully automatic approach. Figure 8 shows sample results obtained using all 3 methods. Approach Pose Accuracy S B M Overall A No context [14] B KnownObject-PCT C Multiple PCT Table 2. Pose accuracy over the entire dataset with a total of = 650 parts. S correspond to the accuracy over images in the soccer set, similarly B for basketball and M for misc. Figure 6. Sample positive examples used for training object detectors; Row 1 and 2 shows positive examples for basketball and soccer ball respectively. them on test images at known scales. To compute the detection accuracy for each detector on the test set, we merge overlapping windows after rejecting responses below a threshold. The detection and false alarm rate for each detector on the entire test set is reported in Table 1 for The pose accuracy obtained using above methods is shown in Table 2. Notice that the use of contextual knowledge improves the pose accuracy by 9%, and using our approach, which is fully automatic, we can obtain an increase of 5%. To clearly demonstrate the strength of our model we also report the accuracy over the parts involved in the interactions in the soccer and basketball set. As shown in Table 3, methods using context (B and C) significantly outperform method A that does not use context. Notice that improvement in accuracy is especially high for basketball set. This is because the basketball set is significantly more

7 cluttered than the soccer and misc set, and hence, pose estimation is much harder; use of context provides additional constraints that help more accurate pose estimation. Approach Pose Accuracy S(legs) B(hands) Overall A No context [14] B KnownObject-PCT C Multiple PCT Table 3. Accuracy over parts involved in the object interaction; for soccer only the legs are considered and for basketball only the hands; thus, accuracy is computed over 44 4 = 176 parts. For pose accuracy to improve using contextual information, the interaction in the image must also be correctly inferred. Thus in addition to the pose accuracy, we also compute the accuracy of interaction categorization. For comparison, we use an alternate approach to categorize an image using the joint spatial likelihood of the detected object and the human pose estimated without using context. This is similar to the scene and object recognition approach [13]. Figure 7 shows the confusion matrices for both the methods, with object based approach as and use of multiple pose context trees as. Notice that the average categorization accuracy using multiple pose context trees is much higher. Misc Soccer Basketball Basketball Soccer Misc Recognition Rate: 80.6% Basketball Soccer Soccer Basketball Misc Misc Recognition Rate: 90% Figure 7. Confusion matrix for interaction categorization: using scene object detection, using multiple pose context trees 6. Conclusion In this paper we proposed an approach to estimate the human pose when interacting with a scene object, and demonstrated the joint inference of the human pose and object increases pose accuracy. We propose the Pose context trees to jointly model the human pose and the object interaction such as dribbling or kicking the soccer ball. To simultaneously infer the interaction category and estimate the human pose, our algorithm consider multiple pose context trees, one for each possible human-object interaction, and find the tree that gives the highest joint likelihood score. We applied our approach to estimate human pose in a single image over a dataset of 65 images with 3 interactions including a category with assorted unknown interactions, and demonstrated that the use of contextual information improves pose accuracy by about 5% (8% over the parts involved in the interaction such as legs for soccer). References [1] M. Andriluka, S. Roth, and B. Schiele. Pictorial structures revisited: People detection and articulated pose estimation. In IEEE CVPR 2009, June , 5, 6 [2] N. Dalal and B. Triggs. Histograms of oriented gradients for human detection. In CVPR, volume 1, pages , june , 4, 5 [3] C. Desai, D. Ramanan, and C. Fowlkes. Discriminative models for multi-class object layout. In International Conference on Computer Vision (ICCV), , 2 [4] P. Felzenszwalb and D. P. Huttenlocher. Pictorial structures for object recognition. IJCV 2005, 61(1):55 79, , 2, 4, 5 [5] V. Ferrari, M. Marin-Jimenez, and A. Zisserman. Progressive search space reduction for human pose estimation. In IEEE Conference on Computer Vision and Pattern Recognition, 2008, pages 1 8, , 4, 6 [6] V. Ferrari, M. Marin-Jimenez, and A. Zisserman. Pose search: Retrieving people using their pose. In CVPR, pages 1 8, [7] J. Friedman, T. Hastie, and R. Tibshirani. Additive logistic regression: a statistical view of boosting. In The Annals of Statistics, volume 38, , 5 [8] A. Gupta and L. S. Davis. Objects in action: An approach for combining action understanding and object perception. In CVPR, volume 0, pages 1 8, , 2 [9] A. Gupta, A. Kembhavi, and L. S. Davis. Observing humanobject interactions: Using spatial and functional compatibility for recognition. In T-PAMI, volume 31, pages , , 2, 3, 5 [10] G. Hua, M.-H. Yang, and Y. Wu. Learning to estimate human pose with data driven belief propagation. In IEEE CVPR 2005, volume 2, pages vol. 2, June , 2 [11] F. Kschischang, B. Frey, and H.-A. Loeliger. Factor graphs and the sum-product algorithm. IEEE Transactions on Information Theory, 47(2): , Feb [12] M. Lee and I. Cohen. Proposal maps driven mcmc for estimating human body pose in static images. In IEEE CVPR 2004, pages , [13] L.-J. Li and L. Fei-Fei. What, where and who? classifying event by scene and object recognition. In ICCV, , 7 [14] D. Ramanan. Learning to parse images of articulated bodies. In Advances in Neural Information Processing Systems 19, pages MIT Press, Cambridge, MA, , 2, 4, 5, 6, 7 [15] D. Ramanan and C. Sminchisescu. Training deformable models for localization. In IEEE CVPR 2006, volume 1, pages , June , 5 [16] L. Sigal, B. Sidharth, S. Roth, M. Black, and M. Isard. Tracking loose-limbed people. In IEEE CVPR 2004, volume I, pages , June , 2

8 Input Iterative Parse KnownObject PCT Multiple PCT (c) (d) (e) Figure 8. Results on Pose Dataset.,,(c) are images from basketball set and (d),(e) are from soccer set. The posterior distributions are also shown for the Iterative Parsing approach and using PCT when action and object position is known. Notice that even in cases where the MAP pose is similar, the pose distribution obtained using PCT is closer to the ground truth. Soccer ball responses are marked in white, and basketballs are marked in yellow. In example (c), basketball gets detected as a soccer ball and thus results in a poor pose estimate using Multiple-PCT, however, when the context is known, true pose is detected using PCT. [17] Y. Wang, H. Jiang, M. S. Drew, Z.-N. Li, and G. Mori. Unsupervised discovery of action classes. In CVPR, pages , [18] J. Zhang, J. Luo, R. Collins,, and Y. Liu. Body localization in still images using hierarchical models and hybrid search. In IEEE CVPR 2006, volume II, pages , June , 2

Category vs. instance recognition

Category vs. instance recognition Category vs. instance recognition Category: Find all the people Find all the buildings Often within a single image Often sliding window Instance: Is this face James? Find this specific famous building

More information

Deformable Part Models

Deformable Part Models CS 1674: Intro to Computer Vision Deformable Part Models Prof. Adriana Kovashka University of Pittsburgh November 9, 2016 Today: Object category detection Window-based approaches: Last time: Viola-Jones

More information

A novel template matching method for human detection

A novel template matching method for human detection University of Wollongong Research Online Faculty of Informatics - Papers (Archive) Faculty of Engineering and Information Sciences 2009 A novel template matching method for human detection Duc Thanh Nguyen

More information

Object Category Detection. Slides mostly from Derek Hoiem

Object Category Detection. Slides mostly from Derek Hoiem Object Category Detection Slides mostly from Derek Hoiem Today s class: Object Category Detection Overview of object category detection Statistical template matching with sliding window Part-based Models

More information

Combining Discriminative Appearance and Segmentation Cues for Articulated Human Pose Estimation

Combining Discriminative Appearance and Segmentation Cues for Articulated Human Pose Estimation Combining Discriminative Appearance and Segmentation Cues for Articulated Human Pose Estimation Sam Johnson and Mark Everingham School of Computing University of Leeds {mat4saj m.everingham}@leeds.ac.uk

More information

Object Category Detection: Sliding Windows

Object Category Detection: Sliding Windows 04/10/12 Object Category Detection: Sliding Windows Computer Vision CS 543 / ECE 549 University of Illinois Derek Hoiem Today s class: Object Category Detection Overview of object category detection Statistical

More information

Modern Object Detection. Most slides from Ali Farhadi

Modern Object Detection. Most slides from Ali Farhadi Modern Object Detection Most slides from Ali Farhadi Comparison of Classifiers assuming x in {0 1} Learning Objective Training Inference Naïve Bayes maximize j i logp + logp ( x y ; θ ) ( y ; θ ) i ij

More information

Structured Models in. Dan Huttenlocher. June 2010

Structured Models in. Dan Huttenlocher. June 2010 Structured Models in Computer Vision i Dan Huttenlocher June 2010 Structured Models Problems where output variables are mutually dependent or constrained E.g., spatial or temporal relations Such dependencies

More information

https://en.wikipedia.org/wiki/the_dress Recap: Viola-Jones sliding window detector Fast detection through two mechanisms Quickly eliminate unlikely windows Use features that are fast to compute Viola

More information

Development in Object Detection. Junyuan Lin May 4th

Development in Object Detection. Junyuan Lin May 4th Development in Object Detection Junyuan Lin May 4th Line of Research [1] N. Dalal and B. Triggs. Histograms of oriented gradients for human detection, CVPR 2005. HOG Feature template [2] P. Felzenszwalb,

More information

Combining PGMs and Discriminative Models for Upper Body Pose Detection

Combining PGMs and Discriminative Models for Upper Body Pose Detection Combining PGMs and Discriminative Models for Upper Body Pose Detection Gedas Bertasius May 30, 2014 1 Introduction In this project, I utilized probabilistic graphical models together with discriminative

More information

Clustered Pose and Nonlinear Appearance Models for Human Pose Estimation

Clustered Pose and Nonlinear Appearance Models for Human Pose Estimation JOHNSON, EVERINGHAM: CLUSTERED MODELS FOR HUMAN POSE ESTIMATION 1 Clustered Pose and Nonlinear Appearance Models for Human Pose Estimation Sam Johnson s.a.johnson04@leeds.ac.uk Mark Everingham m.everingham@leeds.ac.uk

More information

Scale and Rotation Invariant Approach to Tracking Human Body Part Regions in Videos

Scale and Rotation Invariant Approach to Tracking Human Body Part Regions in Videos Scale and Rotation Invariant Approach to Tracking Human Body Part Regions in Videos Yihang Bo Institute of Automation, CAS & Boston College yihang.bo@gmail.com Hao Jiang Computer Science Department, Boston

More information

Object Detection Design challenges

Object Detection Design challenges Object Detection Design challenges How to efficiently search for likely objects Even simple models require searching hundreds of thousands of positions and scales Feature design and scoring How should

More information

Human Upper Body Pose Estimation in Static Images

Human Upper Body Pose Estimation in Static Images 1. Research Team Human Upper Body Pose Estimation in Static Images Project Leader: Graduate Students: Prof. Isaac Cohen, Computer Science Mun Wai Lee 2. Statement of Project Goals This goal of this project

More information

Action Recognition in Cluttered Dynamic Scenes using Pose-Specific Part Models

Action Recognition in Cluttered Dynamic Scenes using Pose-Specific Part Models Action Recognition in Cluttered Dynamic Scenes using Pose-Specific Part Models Vivek Kumar Singh University of Southern California Los Angeles, CA, USA viveksin@usc.edu Ram Nevatia University of Southern

More information

Part-Based Models for Object Class Recognition Part 2

Part-Based Models for Object Class Recognition Part 2 High Level Computer Vision Part-Based Models for Object Class Recognition Part 2 Bernt Schiele - schiele@mpi-inf.mpg.de Mario Fritz - mfritz@mpi-inf.mpg.de https://www.mpi-inf.mpg.de/hlcv Class of Object

More information

Part-Based Models for Object Class Recognition Part 2

Part-Based Models for Object Class Recognition Part 2 High Level Computer Vision Part-Based Models for Object Class Recognition Part 2 Bernt Schiele - schiele@mpi-inf.mpg.de Mario Fritz - mfritz@mpi-inf.mpg.de https://www.mpi-inf.mpg.de/hlcv Class of Object

More information

Segmentation. Bottom up Segmentation Semantic Segmentation

Segmentation. Bottom up Segmentation Semantic Segmentation Segmentation Bottom up Segmentation Semantic Segmentation Semantic Labeling of Street Scenes Ground Truth Labels 11 classes, almost all occur simultaneously, large changes in viewpoint, scale sky, road,

More information

Large-Scale Traffic Sign Recognition based on Local Features and Color Segmentation

Large-Scale Traffic Sign Recognition based on Local Features and Color Segmentation Large-Scale Traffic Sign Recognition based on Local Features and Color Segmentation M. Blauth, E. Kraft, F. Hirschenberger, M. Böhm Fraunhofer Institute for Industrial Mathematics, Fraunhofer-Platz 1,

More information

Previously. Part-based and local feature models for generic object recognition. Bag-of-words model 4/20/2011

Previously. Part-based and local feature models for generic object recognition. Bag-of-words model 4/20/2011 Previously Part-based and local feature models for generic object recognition Wed, April 20 UT-Austin Discriminative classifiers Boosting Nearest neighbors Support vector machines Useful for object recognition

More information

High Level Computer Vision

High Level Computer Vision High Level Computer Vision Part-Based Models for Object Class Recognition Part 2 Bernt Schiele - schiele@mpi-inf.mpg.de Mario Fritz - mfritz@mpi-inf.mpg.de http://www.d2.mpi-inf.mpg.de/cv Please Note No

More information

Detection III: Analyzing and Debugging Detection Methods

Detection III: Analyzing and Debugging Detection Methods CS 1699: Intro to Computer Vision Detection III: Analyzing and Debugging Detection Methods Prof. Adriana Kovashka University of Pittsburgh November 17, 2015 Today Review: Deformable part models How can

More information

Tri-modal Human Body Segmentation

Tri-modal Human Body Segmentation Tri-modal Human Body Segmentation Master of Science Thesis Cristina Palmero Cantariño Advisor: Sergio Escalera Guerrero February 6, 2014 Outline 1 Introduction 2 Tri-modal dataset 3 Proposed baseline 4

More information

Recognizing Human Actions from Still Images with Latent Poses

Recognizing Human Actions from Still Images with Latent Poses Recognizing Human Actions from Still Images with Latent Poses Weilong Yang, Yang Wang, and Greg Mori School of Computing Science Simon Fraser University Burnaby, BC, Canada wya16@sfu.ca, ywang12@cs.sfu.ca,

More information

Category-level localization

Category-level localization Category-level localization Cordelia Schmid Recognition Classification Object present/absent in an image Often presence of a significant amount of background clutter Localization / Detection Localize object

More information

Efficient Detector Adaptation for Object Detection in a Video

Efficient Detector Adaptation for Object Detection in a Video 2013 IEEE Conference on Computer Vision and Pattern Recognition Efficient Detector Adaptation for Object Detection in a Video Pramod Sharma and Ram Nevatia Institute for Robotics and Intelligent Systems,

More information

The Pennsylvania State University. The Graduate School. College of Engineering ONLINE LIVESTREAM CAMERA CALIBRATION FROM CROWD SCENE VIDEOS

The Pennsylvania State University. The Graduate School. College of Engineering ONLINE LIVESTREAM CAMERA CALIBRATION FROM CROWD SCENE VIDEOS The Pennsylvania State University The Graduate School College of Engineering ONLINE LIVESTREAM CAMERA CALIBRATION FROM CROWD SCENE VIDEOS A Thesis in Computer Science and Engineering by Anindita Bandyopadhyay

More information

Part-based and local feature models for generic object recognition

Part-based and local feature models for generic object recognition Part-based and local feature models for generic object recognition May 28 th, 2015 Yong Jae Lee UC Davis Announcements PS2 grades up on SmartSite PS2 stats: Mean: 80.15 Standard Dev: 22.77 Vote on piazza

More information

Object Detection by 3D Aspectlets and Occlusion Reasoning

Object Detection by 3D Aspectlets and Occlusion Reasoning Object Detection by 3D Aspectlets and Occlusion Reasoning Yu Xiang University of Michigan Silvio Savarese Stanford University In the 4th International IEEE Workshop on 3D Representation and Recognition

More information

Articulated Pose Estimation with Flexible Mixtures-of-Parts

Articulated Pose Estimation with Flexible Mixtures-of-Parts Articulated Pose Estimation with Flexible Mixtures-of-Parts PRESENTATION: JESSE DAVIS CS 3710 VISUAL RECOGNITION Outline Modeling Special Cases Inferences Learning Experiments Problem and Relevance Problem:

More information

Detection of Multiple, Partially Occluded Humans in a Single Image by Bayesian Combination of Edgelet Part Detectors

Detection of Multiple, Partially Occluded Humans in a Single Image by Bayesian Combination of Edgelet Part Detectors Detection of Multiple, Partially Occluded Humans in a Single Image by Bayesian Combination of Edgelet Part Detectors Bo Wu Ram Nevatia University of Southern California Institute for Robotics and Intelligent

More information

Beyond Bags of features Spatial information & Shape models

Beyond Bags of features Spatial information & Shape models Beyond Bags of features Spatial information & Shape models Jana Kosecka Many slides adapted from S. Lazebnik, FeiFei Li, Rob Fergus, and Antonio Torralba Detection, recognition (so far )! Bags of features

More information

Selective Search for Object Recognition

Selective Search for Object Recognition Selective Search for Object Recognition Uijlings et al. Schuyler Smith Overview Introduction Object Recognition Selective Search Similarity Metrics Results Object Recognition Kitten Goal: Problem: Where

More information

Detecting Object Instances Without Discriminative Features

Detecting Object Instances Without Discriminative Features Detecting Object Instances Without Discriminative Features Edward Hsiao June 19, 2013 Thesis Committee: Martial Hebert, Chair Alexei Efros Takeo Kanade Andrew Zisserman, University of Oxford 1 Object Instance

More information

Part based models for recognition. Kristen Grauman

Part based models for recognition. Kristen Grauman Part based models for recognition Kristen Grauman UT Austin Limitations of window-based models Not all objects are box-shaped Assuming specific 2d view of object Local components themselves do not necessarily

More information

Detecting and Segmenting Humans in Crowded Scenes

Detecting and Segmenting Humans in Crowded Scenes Detecting and Segmenting Humans in Crowded Scenes Mikel D. Rodriguez University of Central Florida 4000 Central Florida Blvd Orlando, Florida, 32816 mikel@cs.ucf.edu Mubarak Shah University of Central

More information

Detecting Pedestrians by Learning Shapelet Features

Detecting Pedestrians by Learning Shapelet Features Detecting Pedestrians by Learning Shapelet Features Payam Sabzmeydani and Greg Mori School of Computing Science Simon Fraser University Burnaby, BC, Canada {psabzmey,mori}@cs.sfu.ca Abstract In this paper,

More information

Learning to parse images of articulated bodies

Learning to parse images of articulated bodies Learning to parse images of articulated bodies Deva Ramanan Toyota Technological Institute at Chicago Chicago, IL 60637 ramanan@tti-c.org Abstract We consider the machine vision task of pose estimation

More information

HUMAN PARSING WITH A CASCADE OF HIERARCHICAL POSELET BASED PRUNERS

HUMAN PARSING WITH A CASCADE OF HIERARCHICAL POSELET BASED PRUNERS HUMAN PARSING WITH A CASCADE OF HIERARCHICAL POSELET BASED PRUNERS Duan Tran Yang Wang David Forsyth University of Illinois at Urbana Champaign University of Manitoba ABSTRACT We address the problem of

More information

Local cues and global constraints in image understanding

Local cues and global constraints in image understanding Local cues and global constraints in image understanding Olga Barinova Lomonosov Moscow State University *Many slides adopted from the courses of Anton Konushin Image understanding «To see means to know

More information

Discriminative classifiers for image recognition

Discriminative classifiers for image recognition Discriminative classifiers for image recognition May 26 th, 2015 Yong Jae Lee UC Davis Outline Last time: window-based generic object detection basic pipeline face detection with boosting as case study

More information

Object Recognition with Deformable Models

Object Recognition with Deformable Models Object Recognition with Deformable Models Pedro F. Felzenszwalb Department of Computer Science University of Chicago Joint work with: Dan Huttenlocher, Joshua Schwartz, David McAllester, Deva Ramanan.

More information

Finding people in repeated shots of the same scene

Finding people in repeated shots of the same scene 1 Finding people in repeated shots of the same scene Josef Sivic 1 C. Lawrence Zitnick Richard Szeliski 1 University of Oxford Microsoft Research Abstract The goal of this work is to find all occurrences

More information

Action recognition in videos

Action recognition in videos Action recognition in videos Cordelia Schmid INRIA Grenoble Joint work with V. Ferrari, A. Gaidon, Z. Harchaoui, A. Klaeser, A. Prest, H. Wang Action recognition - goal Short actions, i.e. drinking, sit

More information

Human Motion Detection and Tracking for Video Surveillance

Human Motion Detection and Tracking for Video Surveillance Human Motion Detection and Tracking for Video Surveillance Prithviraj Banerjee and Somnath Sengupta Department of Electronics and Electrical Communication Engineering Indian Institute of Technology, Kharagpur,

More information

Find that! Visual Object Detection Primer

Find that! Visual Object Detection Primer Find that! Visual Object Detection Primer SkTech/MIT Innovation Workshop August 16, 2012 Dr. Tomasz Malisiewicz tomasz@csail.mit.edu Find that! Your Goals...imagine one such system that drives information

More information

Multiple-Person Tracking by Detection

Multiple-Person Tracking by Detection http://excel.fit.vutbr.cz Multiple-Person Tracking by Detection Jakub Vojvoda* Abstract Detection and tracking of multiple person is challenging problem mainly due to complexity of scene and large intra-class

More information

Object Category Detection: Sliding Windows

Object Category Detection: Sliding Windows 03/18/10 Object Category Detection: Sliding Windows Computer Vision CS 543 / ECE 549 University of Illinois Derek Hoiem Goal: Detect all instances of objects Influential Works in Detection Sung-Poggio

More information

CRFs for Image Classification

CRFs for Image Classification CRFs for Image Classification Devi Parikh and Dhruv Batra Carnegie Mellon University Pittsburgh, PA 15213 {dparikh,dbatra}@ece.cmu.edu Abstract We use Conditional Random Fields (CRFs) to classify regions

More information

Histograms of Oriented Gradients for Human Detection p. 1/1

Histograms of Oriented Gradients for Human Detection p. 1/1 Histograms of Oriented Gradients for Human Detection p. 1/1 Histograms of Oriented Gradients for Human Detection Navneet Dalal and Bill Triggs INRIA Rhône-Alpes Grenoble, France Funding: acemedia, LAVA,

More information

Left-Luggage Detection using Bayesian Inference

Left-Luggage Detection using Bayesian Inference Left-Luggage Detection using Bayesian Inference Fengjun Lv Xuefeng Song Bo Wu Vivek Kumar Singh Ramakant Nevatia Institute for Robotics and Intelligent Systems University of Southern California Los Angeles,

More information

Learning to Segment Document Images

Learning to Segment Document Images Learning to Segment Document Images K.S. Sesh Kumar, Anoop Namboodiri, and C.V. Jawahar Centre for Visual Information Technology, International Institute of Information Technology, Hyderabad, India Abstract.

More information

Learning and Inferring Depth from Monocular Images. Jiyan Pan April 1, 2009

Learning and Inferring Depth from Monocular Images. Jiyan Pan April 1, 2009 Learning and Inferring Depth from Monocular Images Jiyan Pan April 1, 2009 Traditional ways of inferring depth Binocular disparity Structure from motion Defocus Given a single monocular image, how to infer

More information

Nonflat Observation Model and Adaptive Depth Order Estimation for 3D Human Pose Tracking

Nonflat Observation Model and Adaptive Depth Order Estimation for 3D Human Pose Tracking Nonflat Observation Model and Adaptive Depth Order Estimation for 3D Human Pose Tracking Nam-Gyu Cho, Alan Yuille and Seong-Whan Lee Department of Brain and Cognitive Engineering, orea University, orea

More information

Fast Human Detection Using a Cascade of Histograms of Oriented Gradients

Fast Human Detection Using a Cascade of Histograms of Oriented Gradients MITSUBISHI ELECTRIC RESEARCH LABORATORIES http://www.merl.com Fast Human Detection Using a Cascade of Histograms of Oriented Gradients Qiang Zhu, Shai Avidan, Mei-Chen Yeh, Kwang-Ting Cheng TR26-68 June

More information

Pictorial Structures for Object Recognition

Pictorial Structures for Object Recognition Pictorial Structures for Object Recognition Felzenszwalb and Huttenlocher Presented by Stephen Krotosky Pictorial Structures Introduced by Fischler and Elschlager in 1973 Objects are modeled by a collection

More information

Object Detection with Discriminatively Trained Part Based Models

Object Detection with Discriminatively Trained Part Based Models Object Detection with Discriminatively Trained Part Based Models Pedro F. Felzenszwelb, Ross B. Girshick, David McAllester and Deva Ramanan Presented by Fabricio Santolin da Silva Kaustav Basu Some slides

More information

Tracking People. Tracking People: Context

Tracking People. Tracking People: Context Tracking People A presentation of Deva Ramanan s Finding and Tracking People from the Bottom Up and Strike a Pose: Tracking People by Finding Stylized Poses Tracking People: Context Motion Capture Surveillance

More information

Learning to Estimate Human Pose with Data Driven Belief Propagation

Learning to Estimate Human Pose with Data Driven Belief Propagation Learning to Estimate Human Pose with Data Driven Belief Propagation Gang Hua Ming-Hsuan Yang Ying Wu ECE Department, Northwestern University Honda Research Institute Evanston, IL 60208, U.S.A. Mountain

More information

Separating Objects and Clutter in Indoor Scenes

Separating Objects and Clutter in Indoor Scenes Separating Objects and Clutter in Indoor Scenes Salman H. Khan School of Computer Science & Software Engineering, The University of Western Australia Co-authors: Xuming He, Mohammed Bennamoun, Ferdous

More information

Feature descriptors. Alain Pagani Prof. Didier Stricker. Computer Vision: Object and People Tracking

Feature descriptors. Alain Pagani Prof. Didier Stricker. Computer Vision: Object and People Tracking Feature descriptors Alain Pagani Prof. Didier Stricker Computer Vision: Object and People Tracking 1 Overview Previous lectures: Feature extraction Today: Gradiant/edge Points (Kanade-Tomasi + Harris)

More information

Human detection using local shape and nonredundant

Human detection using local shape and nonredundant University of Wollongong Research Online Faculty of Informatics - Papers (Archive) Faculty of Engineering and Information Sciences 2010 Human detection using local shape and nonredundant binary patterns

More information

Estimating Human Pose in Images. Navraj Singh December 11, 2009

Estimating Human Pose in Images. Navraj Singh December 11, 2009 Estimating Human Pose in Images Navraj Singh December 11, 2009 Introduction This project attempts to improve the performance of an existing method of estimating the pose of humans in still images. Tasks

More information

Object detection using non-redundant local Binary Patterns

Object detection using non-redundant local Binary Patterns University of Wollongong Research Online Faculty of Informatics - Papers (Archive) Faculty of Engineering and Information Sciences 2010 Object detection using non-redundant local Binary Patterns Duc Thanh

More information

OCCLUSION BOUNDARIES ESTIMATION FROM A HIGH-RESOLUTION SAR IMAGE

OCCLUSION BOUNDARIES ESTIMATION FROM A HIGH-RESOLUTION SAR IMAGE OCCLUSION BOUNDARIES ESTIMATION FROM A HIGH-RESOLUTION SAR IMAGE Wenju He, Marc Jäger, and Olaf Hellwich Berlin University of Technology FR3-1, Franklinstr. 28, 10587 Berlin, Germany {wenjuhe, jaeger,

More information

Histogram of Oriented Gradients for Human Detection

Histogram of Oriented Gradients for Human Detection Histogram of Oriented Gradients for Human Detection Article by Navneet Dalal and Bill Triggs All images in presentation is taken from article Presentation by Inge Edward Halsaunet Introduction What: Detect

More information

Linear combinations of simple classifiers for the PASCAL challenge

Linear combinations of simple classifiers for the PASCAL challenge Linear combinations of simple classifiers for the PASCAL challenge Nik A. Melchior and David Lee 16 721 Advanced Perception The Robotics Institute Carnegie Mellon University Email: melchior@cmu.edu, dlee1@andrew.cmu.edu

More information

Human detection using histogram of oriented gradients. Srikumar Ramalingam School of Computing University of Utah

Human detection using histogram of oriented gradients. Srikumar Ramalingam School of Computing University of Utah Human detection using histogram of oriented gradients Srikumar Ramalingam School of Computing University of Utah Reference Navneet Dalal and Bill Triggs, Histograms of Oriented Gradients for Human Detection,

More information

Pairwise Threshold for Gaussian Mixture Classification and its Application on Human Tracking Enhancement

Pairwise Threshold for Gaussian Mixture Classification and its Application on Human Tracking Enhancement Pairwise Threshold for Gaussian Mixture Classification and its Application on Human Tracking Enhancement Daegeon Kim Sung Chun Lee Institute for Robotics and Intelligent Systems University of Southern

More information

Object Recognition II

Object Recognition II Object Recognition II Linda Shapiro EE/CSE 576 with CNN slides from Ross Girshick 1 Outline Object detection the task, evaluation, datasets Convolutional Neural Networks (CNNs) overview and history Region-based

More information

Analysis: TextonBoost and Semantic Texton Forests. Daniel Munoz Februrary 9, 2009

Analysis: TextonBoost and Semantic Texton Forests. Daniel Munoz Februrary 9, 2009 Analysis: TextonBoost and Semantic Texton Forests Daniel Munoz 16-721 Februrary 9, 2009 Papers [shotton-eccv-06] J. Shotton, J. Winn, C. Rother, A. Criminisi, TextonBoost: Joint Appearance, Shape and Context

More information

PEDESTRIAN DETECTION IN CROWDED SCENES VIA SCALE AND OCCLUSION ANALYSIS

PEDESTRIAN DETECTION IN CROWDED SCENES VIA SCALE AND OCCLUSION ANALYSIS PEDESTRIAN DETECTION IN CROWDED SCENES VIA SCALE AND OCCLUSION ANALYSIS Lu Wang Lisheng Xu Ming-Hsuan Yang Northeastern University, China University of California at Merced, USA ABSTRACT Despite significant

More information

A Hierarchical Compositional System for Rapid Object Detection

A Hierarchical Compositional System for Rapid Object Detection A Hierarchical Compositional System for Rapid Object Detection Long Zhu and Alan Yuille Department of Statistics University of California at Los Angeles Los Angeles, CA 90095 {lzhu,yuille}@stat.ucla.edu

More information

Conditional Random Fields for Object Recognition

Conditional Random Fields for Object Recognition Conditional Random Fields for Object Recognition Ariadna Quattoni Michael Collins Trevor Darrell MIT Computer Science and Artificial Intelligence Laboratory Cambridge, MA 02139 {ariadna, mcollins, trevor}@csail.mit.edu

More information

Selection of Scale-Invariant Parts for Object Class Recognition

Selection of Scale-Invariant Parts for Object Class Recognition Selection of Scale-Invariant Parts for Object Class Recognition Gy. Dorkó and C. Schmid INRIA Rhône-Alpes, GRAVIR-CNRS 655, av. de l Europe, 3833 Montbonnot, France fdorko,schmidg@inrialpes.fr Abstract

More information

Window based detectors

Window based detectors Window based detectors CS 554 Computer Vision Pinar Duygulu Bilkent University (Source: James Hays, Brown) Today Window-based generic object detection basic pipeline boosting classifiers face detection

More information

Real-Time Human Detection using Relational Depth Similarity Features

Real-Time Human Detection using Relational Depth Similarity Features Real-Time Human Detection using Relational Depth Similarity Features Sho Ikemura, Hironobu Fujiyoshi Dept. of Computer Science, Chubu University. Matsumoto 1200, Kasugai, Aichi, 487-8501 Japan. si@vision.cs.chubu.ac.jp,

More information

Pedestrian Detection with Occlusion Handling

Pedestrian Detection with Occlusion Handling Pedestrian Detection with Occlusion Handling Yawar Rehman 1, Irfan Riaz 2, Fan Xue 3, Jingchun Piao 4, Jameel Ahmed Khan 5 and Hyunchul Shin 6 Department of Electronics and Communication Engineering, Hanyang

More information

Person Detection in Images using HoG + Gentleboost. Rahul Rajan June 1st July 15th CMU Q Robotics Lab

Person Detection in Images using HoG + Gentleboost. Rahul Rajan June 1st July 15th CMU Q Robotics Lab Person Detection in Images using HoG + Gentleboost Rahul Rajan June 1st July 15th CMU Q Robotics Lab 1 Introduction One of the goals of computer vision Object class detection car, animal, humans Human

More information

Object Recognition Using Pictorial Structures. Daniel Huttenlocher Computer Science Department. In This Talk. Object recognition in computer vision

Object Recognition Using Pictorial Structures. Daniel Huttenlocher Computer Science Department. In This Talk. Object recognition in computer vision Object Recognition Using Pictorial Structures Daniel Huttenlocher Computer Science Department Joint work with Pedro Felzenszwalb, MIT AI Lab In This Talk Object recognition in computer vision Brief definition

More information

CS 664 Flexible Templates. Daniel Huttenlocher

CS 664 Flexible Templates. Daniel Huttenlocher CS 664 Flexible Templates Daniel Huttenlocher Flexible Template Matching Pictorial structures Parts connected by springs and appearance models for each part Used for human bodies, faces Fischler&Elschlager,

More information

Definition, Detection, and Evaluation of Meeting Events in Airport Surveillance Videos

Definition, Detection, and Evaluation of Meeting Events in Airport Surveillance Videos Definition, Detection, and Evaluation of Meeting Events in Airport Surveillance Videos Sung Chun Lee, Chang Huang, and Ram Nevatia University of Southern California, Los Angeles, CA 90089, USA sungchun@usc.edu,

More information

DPM Score Regressor for Detecting Occluded Humans from Depth Images

DPM Score Regressor for Detecting Occluded Humans from Depth Images DPM Score Regressor for Detecting Occluded Humans from Depth Images Tsuyoshi Usami, Hiroshi Fukui, Yuji Yamauchi, Takayoshi Yamashita and Hironobu Fujiyoshi Email: usami915@vision.cs.chubu.ac.jp Email:

More information

Face detection and recognition. Detection Recognition Sally

Face detection and recognition. Detection Recognition Sally Face detection and recognition Detection Recognition Sally Face detection & recognition Viola & Jones detector Available in open CV Face recognition Eigenfaces for face recognition Metric learning identification

More information

Learning and Recognizing Visual Object Categories Without First Detecting Features

Learning and Recognizing Visual Object Categories Without First Detecting Features Learning and Recognizing Visual Object Categories Without First Detecting Features Daniel Huttenlocher 2007 Joint work with D. Crandall and P. Felzenszwalb Object Category Recognition Generic classes rather

More information

Visual Motion Analysis and Tracking Part II

Visual Motion Analysis and Tracking Part II Visual Motion Analysis and Tracking Part II David J Fleet and Allan D Jepson CIAR NCAP Summer School July 12-16, 16, 2005 Outline Optical Flow and Tracking: Optical flow estimation (robust, iterative refinement,

More information

2D Image Processing Feature Descriptors

2D Image Processing Feature Descriptors 2D Image Processing Feature Descriptors Prof. Didier Stricker Kaiserlautern University http://ags.cs.uni-kl.de/ DFKI Deutsches Forschungszentrum für Künstliche Intelligenz http://av.dfki.de 1 Overview

More information

A Generative Model for Simultaneous Estimation of Human Body Shape and Pixel-level Segmentation

A Generative Model for Simultaneous Estimation of Human Body Shape and Pixel-level Segmentation A Generative Model for Simultaneous Estimation of Human Body Shape and Pixel-level Segmentation Ingmar Rauschert and Robert T. Collins Pennsylvania State University, University Park, 16802 PA, USA Abstract.

More information

Located Hidden Random Fields: Learning Discriminative Parts for Object Detection

Located Hidden Random Fields: Learning Discriminative Parts for Object Detection Located Hidden Random Fields: Learning Discriminative Parts for Object Detection Ashish Kapoor 1 and John Winn 2 1 MIT Media Laboratory, Cambridge, MA 02139, USA kapoor@media.mit.edu 2 Microsoft Research,

More information

Mobile Human Detection Systems based on Sliding Windows Approach-A Review

Mobile Human Detection Systems based on Sliding Windows Approach-A Review Mobile Human Detection Systems based on Sliding Windows Approach-A Review Seminar: Mobile Human detection systems Njieutcheu Tassi cedrique Rovile Department of Computer Engineering University of Heidelberg

More information

Scene Grammars, Factor Graphs, and Belief Propagation

Scene Grammars, Factor Graphs, and Belief Propagation Scene Grammars, Factor Graphs, and Belief Propagation Pedro Felzenszwalb Brown University Joint work with Jeroen Chua Probabilistic Scene Grammars General purpose framework for image understanding and

More information

Hierarchical Part-Based Human Body Pose Estimation

Hierarchical Part-Based Human Body Pose Estimation Hierarchical Part-Based Human Body Pose Estimation R. Navaratnam A. Thayananthan P. H. S. Torr R. Cipolla University of Cambridge Oxford Brookes Univeristy Department of Engineering Cambridge, CB2 1PZ,

More information

CS 231A Computer Vision (Fall 2011) Problem Set 4

CS 231A Computer Vision (Fall 2011) Problem Set 4 CS 231A Computer Vision (Fall 2011) Problem Set 4 Due: Nov. 30 th, 2011 (9:30am) 1 Part-based models for Object Recognition (50 points) One approach to object recognition is to use a deformable part-based

More information

Motion Tracking and Event Understanding in Video Sequences

Motion Tracking and Event Understanding in Video Sequences Motion Tracking and Event Understanding in Video Sequences Isaac Cohen Elaine Kang, Jinman Kang Institute for Robotics and Intelligent Systems University of Southern California Los Angeles, CA Objectives!

More information

Scene Grammars, Factor Graphs, and Belief Propagation

Scene Grammars, Factor Graphs, and Belief Propagation Scene Grammars, Factor Graphs, and Belief Propagation Pedro Felzenszwalb Brown University Joint work with Jeroen Chua Probabilistic Scene Grammars General purpose framework for image understanding and

More information

Three-Dimensional Object Detection and Layout Prediction using Clouds of Oriented Gradients

Three-Dimensional Object Detection and Layout Prediction using Clouds of Oriented Gradients ThreeDimensional Object Detection and Layout Prediction using Clouds of Oriented Gradients Authors: Zhile Ren, Erik B. Sudderth Presented by: Shannon Kao, Max Wang October 19, 2016 Introduction Given an

More information

A Comparison on Different 2D Human Pose Estimation Method

A Comparison on Different 2D Human Pose Estimation Method A Comparison on Different 2D Human Pose Estimation Method Santhanu P.Mohan 1, Athira Sugathan 2, Sumesh Sekharan 3 1 Assistant Professor, IT Department, VJCET, Kerala, India 2 Student, CVIP, ASE, Tamilnadu,

More information

Human Detection and Tracking for Video Surveillance: A Cognitive Science Approach

Human Detection and Tracking for Video Surveillance: A Cognitive Science Approach Human Detection and Tracking for Video Surveillance: A Cognitive Science Approach Vandit Gajjar gajjar.vandit.381@ldce.ac.in Ayesha Gurnani gurnani.ayesha.52@ldce.ac.in Yash Khandhediya khandhediya.yash.364@ldce.ac.in

More information

Beyond Actions: Discriminative Models for Contextual Group Activities

Beyond Actions: Discriminative Models for Contextual Group Activities Beyond Actions: Discriminative Models for Contextual Group Activities Tian Lan School of Computing Science Simon Fraser University tla58@sfu.ca Weilong Yang School of Computing Science Simon Fraser University

More information