HUMAN PARSING WITH A CASCADE OF HIERARCHICAL POSELET BASED PRUNERS

Size: px
Start display at page:

Download "HUMAN PARSING WITH A CASCADE OF HIERARCHICAL POSELET BASED PRUNERS"

Transcription

1 HUMAN PARSING WITH A CASCADE OF HIERARCHICAL POSELET BASED PRUNERS Duan Tran Yang Wang David Forsyth University of Illinois at Urbana Champaign University of Manitoba ABSTRACT We address the problem of human parsing using part-based models. In particular, we consider part-based models that exploit rich pairwise relationship between parts, e.g. the color symmetry between left/right limbs. This poses a computational challenge since the state space of each part is very large, and algorithmic tricks (e.g. the distance transform) cannot be applied to handle these types of pairwise relationships. We propose to prune the state space of each part using a cascade of pruners. These pruners can filter out 99.6% of the states per part to about 500 states per part, while keeping the ground-truth states in the pruned state most of the time. In the pruned space, we can afford to apply human parsing models with more complex pairwise relationships between parts, such as the color symmetry. We demonstrate our method on a challenging human parsing dataset. Index Terms human pose estimation, gesture analysis 1. INTRODUCTION Part-based models (e.g. pictorial structures [1]) are a class of popular approaches in human parsing. These models represent the human pose as a collection of parts. Each part is represented by one or more appearance templates. An undirected graphical model is used to capture the pairwise relationships between pairs of parts that are connected. Many existing part-based approaches can be broadly categorized into three (possibly overlapping) classes: (1) Tractable models (e.g. kinematic tree) with exact inference (e.g. [1, 2]). These approaches typically use treestructured models and simple pairwise relationships between parts. Inference can often be done exactly. Tree-structured models have been shown to be very successful, in particular [2] has achieved the state-of-the-art results on several benchmark datasets. But they are severely restrictive in what can be modeled. For example, the color symmetry of left/right limbs are usually not captured in those models. (2) Intractable models with approximate inference (e.g. [3]). These approaches use non-tree loopy graph models and offer general insights on how to move beyond simple tree models in order to exploit richer relationships of parts. However, due to the computational difficulty, the pairwise relationship is still limited to only simple spatial relationships in most of these models. (3) Pruning-based approaches (e.g. [4, 5]). These approaches aim to reduce the search space using various strategies. We believe pruning-based approaches are important since they provide generic ideas that can be used with both tree-structured and non-tree models. In addition, both learning and inference can benefit from efficient pruning strategies. To motivate the need for pruning, let us take a closer look at the computational complexity of the inference algorithms in part-based models. For tree-structured models, the search over the joint pose space requires O(M K 2 ) computation, where M is the number of parts and K is the number of states for a part. In human parsing, M is usually small. However, the state space K for a part is often quite large. For example, in [1], it is roughly the number of pixels in an image. A brute-force search algorithm is often infeasible. In the literature, people have taken two approaches to address this computational issue. First, one can impose some restriction on the form of pairwise relationship in the model, so that fast algorithmic tricks can be applied. For example, the pictorial structure in [1] assumes the pairwise relationship is a Mahalanobis distance between the locations of a pair of parts. In that case, distance transform can be applied to reduce the complexity to O(M K). Unfortunately this approach can only handle some special forms of pairwise relationships. More complex pairwise relationships (e.g. color symmetry between left/right limbs) cannot be directly used due to the computational issue, even though they are very helpful. The second approach is to develop a strategy to prune the state space of each part, so we can apply brute-force search. Examples of such approaches include [5, 6]. Our work falls under the second approach. We learn a cascade of pruners to progressively filter out the state space for each part. Some pruners in the cascade are easy to compute and can be applied to quickly remove a large portion of the state space. Then more complex pruners are applied to further prune the space. In the end, we are left with a small state space for each part. In this pruned space, we can afford to apply rich models that involve complex pairwise relationships between parts. We demonstrate our approach on a challenging dataset involving very aggressive pose variations. We show that even though our final human parsing model is applied on a pruned state space, we can still outperforms other competing methods, since we can leverage the power of more complex pairwise relationships (e.g. color symmetry). Related work: Part-based representation is a popular approach in human parsing. Tree-structured models [7, 1, 8, 2] are commonly used due to their efficiency. Loopy models [9, 10, 11, 3] have also been developed, but they usually require approximate inference.

2 Many part-based models use discriminative learning to train the model parameters. Examples include the conditional random fields [12, 8], max-margin learning [13, 3, 2] and boosting [7, 5, 14]. Previous approaches have also explored various features, including image segments (superpixels) [15, 16, 17, 18, 5, 19], color features [8, 4], gradient features [7, 20, 3, 2]. Our cascaded pruning strategy is mostly related to Sapp et al. [5]. In their work, they use a coarse-to-fine cascade of pictorial structures for human parsing. Our work is different from [5] in that we consider larger parts in addition to rigid parts. The part-based model used in our work is mostly related to the hierarchical poselet human parsing in [3]. Most previous work of part-based models only considers rigid parts (e.g. torso, head, half limbs). The method in [3] extends traditional part-based models by introducing larger parts that are compositions of multiple rigid parts. Their argument is that those larger parts usually have distinctive appearance patterns that are easier to detect. Similar ideas have also been used in [21]. In our work, we apply hierarchical poselets in the setting of state space pruning. Although we build upon the part-based models in [3], it is important to keep in mind that we are proposing a general pruning framework that can be used with any part-based models. 2. PART-BASED MODELS FOR HUMAN PARSING We briefly review part-based models for human parsing in this section. In particular, we focus on the hierarchical poselet models introduced in [3] which our method builds upon. The human body is represented as a collection of M parts. A standard pictorial structure [1, 8, 2] uses 10 parts including head, torso, half limbs. We can use an undirected graph G = (V, E) to represent the pose configuration. A vertex i V in the graph corresponds to the i-th part. Its label l i represents the configuration of this part. In standard pictorial structures, l i encodes the (x, y) location and the orientation of the part. An edge (i, j) E in the graph captures certain constraints (e.g. spatial and/or appearance) between a pair of parts. Recently, Wang et al. [3] have extended standard partbased models by introducing larger parts that are compositions of several rigid parts into their model. In particular, they define 20 parts represented by a loopy graph shown in Fig. 1 (left). Each part is represented by several poselets a concept first introduced by Bourdev et al. [22, 23]. In a nutshell, poselets refer to body parts that are tightly clustered. In [3], the configuration l i of the i-th part encodes the (x, y) location and the poselet index of this part. Part-based models use a scoring function in the following form to measure the compatibility of an image I and a pose configuration L: C(L; I, W) = i V wi Φ i (l i ; I)+ wijφ ij (l i, l j ; I) (1) i,j E Fig. 1: (Left) The relational graph of the hierarchical poselet model in [3]. Each vertex represents a part. Each edge represents the pairwise relationship between two parts. (Right) A tree structured model obtained by removing some edges in the model on the left. where W = {w i, w ij : i V, (i, j) E} are model parameters. Φ i (l i ; I) is a unary potential of part i. Φ ij (l i, l j ; I) is the pairwise relation between part i and part j. In standard pictorial structures, the E forms a tree. While in [3], E forms a loopy graph. Finding the best pose configuration L involves solving the following inference problem (also known as MAP assignment): L = arg max L L C(L; I, W). This inference problem can be solved by belief propagation (BP) when E is a tree (as in [1, 8, 2]) or by loopy belief propagation (LBP) when E has cycles (as in [3]). As mentioned earlier, the computational bottleneck of applying BP/LBP is that it involves a computation of O(K 2 ) where K is the number of possible labels of a part. Typically, K can be about 10,000 for each part. Brute-force BP/LBP is obviously infeasible. When the pairwise potential has some special form, one can use some algorithmic tricks (e.g. distance transform [1]) to reduce the computation to O(K). Unfortunately, the distance transform cannot handle all forms of pairwise potentials. For example, the color symmetry between left/right limbs is a very useful cue for human parsing. But it is rarely used in previous work, mainly because it requires a pairwise potential that cannot be handled by the distance transform or other computation tricks. 3. HUMAN PARSING WITH CASCADED PRUNERS The main idea of our work is to develop a strategy to efficiently prune the search space of each part. In the pruned space, we will be able to use more complex models that involve richer pairwise relationships between parts (e.g. color symmetry between left/right limbs) by applying brute-force BP/LBP. In this section, we introduce a sequence of increasingly complex pruners that progressively reduce the search space. Our pruning strategy is inspired by Sapp et al. [5] that learn a coarse-to-fine cascade of pictorial structure models for human parsing. In particular, they prune the search space of l i using the max-marginal defined as follows: C(l i ; I, W) = max l L {C(l ; I, W) : l i = l i }. Let us define the best MAP assignment as C (I; W) = max l L {C(l; I; W)}. The method in [5] learns to prune the state space by considering the two competing objectives that trade off accuracy and

3 efficiency: t(i, W, α) = αc (I; W) + 1 α M M i=1 1 C(l i ; I) (2) L i l L i where α is a parameter to control how aggressively to prune. When α = 1, only the MAP assignment is kept. When α = 0, approximately half of the space is pruned. Sapp et al. [5] learn the parameter W for pruning using a max-margin approach. To do so, they minimize the hinge loss of the incorrectly pruned part states over the example set X = {(I n, L n )} N n=1 as follows: min λ 1 2 W N n examples ξ n (L n, I n ; W) (3) where ξ n (L n, I n ; W) = max{0, 1 + t n (I n, W, α) C(L n ; I n ; W)} is the hinge loss to measure the margin between the scores of the MAP assignment and the ground truth. This essentially learns W to ensure the score of ground-truth label is larger than the score of any other competing labels by a margin of 1. The update rule of W at iteration t + 1 is: W t+1 W t + η( λw t + W t), where W t = Φ(L; I) αφ(l ; I) (1 α) 1 1 Φ(l i ; I) M L i i l i L i where η is the learning rate, Φ(L; I) is the feature vector at the ground truth configuration L of the example I, Φ(L ; I) is the feature vector of the MAP assignment found with respect to W t. Our pruners build upon Sapp et al. [5]. But there are several important distinctions. First, the cascade method in [5] uses the same tree-structured model at each stage at different spatial resolutions in a coarse-to-fine manner. The coarselevel model uses simple features and is only applied at several coarse spatial positions in the image. The fine-level model uses more complex features and is applied at finer resolution in the space pruned by coarser models. In contrast, we use different models at different levels of the cascade. Second, the pruning framework in [5] is based on appearances of small parts (i.e. torso, head, half limbs) connected in the pictorial structure model. However, we would like to argue that at coarse levels, one should not even consider small parts since they are hard to distinguish. Instead, we should try to use appearance models from larger body parts that are easy to identify at the coarse level. In this paper, we use the hierarchical poselets introduced in [3] to learn appearance models for both small rigid parts, as well as large parts that are compositions (e.g. torso+legs) of several rigid parts. The detector responses of large parts will provide a contextual information to allow us to prune the search space of small parts. It is also interesting to compare our work with the pruning method by Ferrari et al. [4]. In that work, the human parsing algorithm is run only within the spatial region localized by a pre-trained upper-body detector. In that case, the upper-body detector plays a similar role as the pruner based on appearance models of large parts (i.e. upper-body). In the following, we introduce three types of increasingly complex pruners used in our cascade, namely part pruners (Sec 3.1), tree-based pruners (Sec 3.2), and enhanced tree-based pruners (Sec 3.3). These pruners are essentially models in forms similar to Eq. 1, but they differ in their definitions of state space of a part and the structure of their relational models. For part pruners and tree-based pruners, the state space l i of a part consists of the 2D locations in the image, i.e. l i = (x i, y i ). For enhanced tree-based pruners, the state space l i also includes the poselet index, i.e. l i = (x i, y i, z i ). For notational convenience, we will use l i in the following, but it is important to keep in mind that l i has slightly different meanings in different pruners Part pruners We call the first type of pruners the part pruners. Here the state space for a part is the 2D locations in the image. We consider a scoring function without pairwise potentials in the following form: C 1 (L; I, W) = i V w i Φ i(l i ; I), where Φ i (l i ; I) is a vector of poselet responses at location l i. Instead of only considering the poselets corresponding to the i- part, we also consider the poselet responses from other larger supporting parts. For example, if the i-th part corresponding to the left upper arm, one of the supporting part is the left arm. The feature vector Φ i (l i ; I) is a vector of poselet responses from all these parts (both left upper arm and upper arm in this case). The parameters w i re-weight the poselet responses from both part i and its supporting parts. Our intuition is that the appearance of larger parts (left arm) will provide some contextual information that help distinguishing small parts (left upper arm). This can be demonstrated in Fig. 2, where we show examples of space pruning with and without supporting larger parts. We can see that with contextual information provided from larger parts, the pruned states tend to peak around the ground-truth locations. (a) Fig. 2: An example of state pruning (right lower arm) by part pruners with (a) and without (b) support from large parts. Each figure shows the remaining states and the heat map of part state confidence (warm color indicates high confidence of remaining un-pruned). Part pruners are able to prune away at least 75% unlikely states. The remaining states in (a) concentrate around the correct location while those in (b) scatter all over the image. This indicates that large parts are helpful in resolving ambiguity of small parts (best viewed in color). Since there are no pairwise relations in part pruners, finding the MAP assignment can be done very efficiently by (b)

4 searching the best state for each part independently of other parts. We learn the parameters using the update rules in Eq. 4. In our work, we build three level of part pruners instead of just one level, which allow pruners at different levels to focus on different types of unlikely part states. This is in analog to the different weak learners used at various stages in the Viola- Jones detector [24]. We choose the trade-off parameter α such that after these three levels of part pruners, we filter more than 75% states while safely keeping the correct locations in the remaining state space most of the time Tree-based pruners The part pruners can only effectively prune a portion of the search space. If we set α to prune more aggressively, we will start filtering out the correct locations as well. In order to effectively prune more space, we need a more complex model. In this section, we introduce pruners based on hierarchical tree structured models shown in Fig. 1(right). This is a simplified structure of the loopy relational model used in [3] (Fig. 1(left)) by dropping edges in the loops. The model is more powerful than the part pruners at the expense of more expensive computation. But since the structure is a tree and the fact that we only need to consider states that pass the part pruners, the computation is still manageable. Given an example I and a configuration l = {l i } M i=1 of parts in the current state space, we use a compatibility function of the form Eq. 1. Similar to part pruners, the state space of a part is the 2D locations in the image. Bu the difference is that now the compatibility function has pairwise potentials that capture the spatial relationships between certain pairs of parts. Similarly, we learn the parameters of tree-based pruners using the update rules in 4. But now we need to consider pairwise potentials when computing the MAP assignment and max-marginals. We build four levels of tree-based pruners. As shown in the experiments (Sec. 5), there are about remaining states (2D locations) for each part after these levels. (a) (b) (c) (d) Fig. 3: Examples of remaining states of left lower arm and left lower leg after the tree-based pruners (3(a) and 3(b)) and after the enhanced tree-based pruners (3(c) and 3(d)). Once part pruners have been applied, tree-based pruners are applied to further prune away unlikely part states (more than 95%) and leave a small number of 2D part states (around D locations per part). However, when adding one more dimension of the part poselet index to the search spaces (8 to 15 poselets per part), the number of states per part is still relatively large ( actual states per part). We perform enhanced tree-based pruners to work on the full state representation to prune to about 500 states per part. This set of states is small enough that exhaustive search is affordable Enhanced tree-based pruners After the tree-based pruners, each part is left with about D locations to search over. We could apply a human parsing method (e.g. [3]) at this stage. However, the part configuration in [3] also involves a poselet index in addition to the 2D location. If we search over all part poselets (ranging from 8 poselets for the torso to 15 poselets for the half limbs), the search space is still too large (approximately states per part). Therefore, we develop an enhanced treebased pruners to prune the state space of the full representation (2D locations and poselet indices) used in [3]. We use the same tree structure in tree-based pruners (Fig. 1 (right)). But the difference is that the tree-based pruners in Sec. 3.2 only consider the (x, y) location of each part. In contrast, we now take the poselet index into the part state to make a full representation. Each state l i for a part i is a triple of l i = (x i, y i, z i ) corresponding its 2D location and the poselet index. In enhanced tree-based pruners, we also add part poselet priors and poselet co-occurrence priors to the compatibility function of Eq. 1. Yang et al. [2] have demonstrated the benefit of these priors for human parsing problems, although their notions of parts are small parts (patches around body joints). Let b i (z i ) be the poselet prior of poselet z i of part i, and b ij (z i, z j ) be the co-occurrence prior of poselet z i, z j of part i and part j. Now, the compatibility function becomes: C b (l; I; W T ) = B(l) + wi Φ i (l i ; I) + wijφ ij (l i, l j ; I) i V i,j E B(l) = b i (z i ) + b ij (z i, z j ) (4) i V i,j E where l = {l i = (x i, y i, z i )} M i=1 are the 2D locations and poselet indices of parts. Similarly, we use the update rules in Eq. 4 to learn the parameters. But the difference is that we use the compatibility function defined in Eq. 4. We build two levels of enhanced tree-based pruners. After these levels, we have only a small set of states (about 500 states per part). This set is small enough to build a parser using appearance models. Figure 3 shows an example of pruning after the tree-based pruners and the enhanced tree-based pruners. We will quantify the reduction rates after each level of the cascade in Sec Final Human Parser After the three pruning steps in Sec 3, we are left with a small number ( 500) of remaining states for each part. At this stage, we can afford to use complex models and pairwise features. Our parsing model is based on the hierarchical poselet model [3] which represents the configuration of the human pose using a 20-part model shown in Fig. 1 with a few more additional edges in the graph (details below). The configuration l i of the i-th part is parameterized by the (x, y) location and the poselet index of the i-th part.

5 Table 1: Reduction rates for rigid part after each type of the cascade of pruners. Pruners are sequentially applied to prune on the current set of states. Each row shows the percentage of part states are filtered out after that pruner level. For advanced tree pruners we tune the thresholds such that about 99.6% of part states that are removed after all these levels. This ensures the number of part states are small enough for the final parser. Part torso head u-arm l-arm u-leg l-leg Pruner Parts Trees Enhanced trees Table 2: P CP 0.2 rates for oracle parser after each type of the cascade of pruners. Oracle states are the best states (assuming access to the ground-truth) chosen from the pool of the current remaining part states (see details in section 5). After applying pruners, the oracle torso gets 76.5% P CP 0.2 and lower arms get 54.9% P CP 0.2 (Note that Sapp s pruner [5] gets 54% P CP 0.2 for lower arm on the Buffy dataset). Note that P CP 0.2 is a more restrictive criterion than P CP 0.5. The 76.5% P CP 0.2 rate for torso means that for 76.5% of the examples, the states after pruners contain at least one state whose distance to the ground-truth is within 20% of the length of the torso. Part Pruner torso head u-arm l-arm u-leg l-leg Parts Trees Enhanced trees In [3], the pairwise potentials are limited to simple spatial constraints since the inference algorithm (i.e. message passing) can be done efficiently for this type of pairwise potentials. Their method cannot exploit other richer pairwise potentials (e.g. color symmetry between left/right limbs) because the message passing algorithm involves O(K 2 ) computation. But in our case, we have pruned K to a very small number ( 500). We can now afford this O(K 2 ) computation using exhaustive search and exploit richer forms of pairwise potentials that depend on the image. We augment the graph in Fig. 1 by adding edges (not shown in Fig. 1 to keep it less cluttered) between left/right half-limbs and use the color symmetry as the pairwise appearance relations. Precomputing those pairwise potentials allows us to solve the MAP assignment efficiently. 4. IMPLEMENTATION DETAILS We describe some implementation details in this section. State space resolution: The 2D-location state space is on image grid of (about 4 pixels for each grid cell size). This 2D resolution is the same in all levels of pruners. For enhanced poselet pruners, we expand to the full state representation by adding the part poselet index dimension. Poselet dimension sizes vary between 8 to 15 depending on parts. At the beginning there are 10,000 2D-states per part (w.r.t the grid size ). Part pruners and tree-based pruners can filter out more than 95% states leaving about D-states. We then expand these 2D states to full representation by enumerating all poselet indexes. Then the enhanced tree-based pruners are applied to filter to about at most 500 states per part. The final human parsing algorithm is applied to search for the best pose configuration in the remaining set of states. Unary features: For all models (pruners and parsers), we use part poselet responses as unary features. Poselet responses are precomputed for all examples from pre-trained poselet templates (using SVM on HOG features). For part pruners, the unary features at each part are augmented with the poselet responses from supporting parts at the same part location. Pairwise features: For tree-based pruners and enhanced treebased pruners, there are no pairwise potentials for the color symmetry. We only use pairwise potentials to capture the spatial constraints between pairs of parts. We use the binning scheme in [8] to represent the spatial constraint. For enhanced tree-based pruners, we represent features for pairwise relations of part poselets by a vector of size (#poselet part i) (#poselet part j). The corresponding index in the vector of a pair of poselets z i, z j will be 1, others are zeros. For the parser, we use color appearance models for related primitive parts to capture the similarity of symmetric parts (e.g. limbs on the left are similar to limbs on the right). 5. EXPERIMENTS We evaluate our method on the challenging UIUC Sports dataset introduced by [3]. This dataset is an extension of the people dataset in [6] by adding more sport images downloaded from the Internet. There are totally 1299 examples of more than 20 sport categories including: badminton, acrobatics, cycling, American football, croquet, hockey, figure skating, soccer, golf, horseback riding, rugby, etc. Following [3], we use 650 images for training and 649 for testing Evaluation on the pruners A good pruner should remove most of the states without filtering out the ground-truth states. We use the PCP measurement (percentage of correctly localized parts) defined in [4] to evaluate the pruners. P CP k% means a predicted part is considered correct if its two end points lie within k% of the ground-truth length of the part. We evaluate our pruners by checking whether the pruned state space still contains a part within P CP 0.2 of the right answer (given by an oracle parser). The oracle parser will choose the best state for each part in the current state space. This essentially gives an upper bound performance of the parsing results on the pruned space. We show the reduction rates for rigid parts up to each pruner level in Table 1 and the corresponding P CP 0.2 rates by an oracle parser in Table 2.. We can see that after applying the three types of pruners, we can filter out 99.6% of part states (keeping about states per part) while still achieving high P CP 0.2 rates.

6 and small rigid parts. Large parts provide useful contextual information for small parts. The pruner cascade effectively filter out more than 99.6% part states to about 500 states per part. We have demonstrated an improvement of the final human parsing model on the pruned set using complex pairwise relationships between parts. 7. REFERENCES Fig. 4: Sample parsing results of our approach on the UIUC Sports dataset. Table 3: Comparisons of parsing results with other methods on the UIUC Sports dataset. The percentage of correctly estimated parts (P CP 0.5) rigid body parts. If two numbers are shown in one cell, they indicate the left/right body parts. Note that the improvement of our method over [3] shows the benefit of modeling color symmetries between left/right limbs as pairwise potentials in the model. Method torso upper leg lower leg upper arm lower arm head [8] [7] [2] [3] Ours Evaluation on human parsing Figure 4 shows some typical parsing examples by our method. To evaluate the final human parsing results, we use P CP 0.5, which is a measurement commonly used by previous approaches. We compare our parsing results with other state-ofthe-art methods in Table 3. The comparison with [3] is probably the most informative one, since [3] uses similar image features and model formulation, but without complex pairwise relationship of color symmetry. We can see that our method outperforms [3] for all the parts. This is a strong evidence of the benefit of using complex pairwise relationships in the pruned space. The only method that outperforms ours is the recent work by Yang and Ramanan [2]. However, we would like to point out that our work provides a general framework that can be adopted in any human parsing method. As future work, we plan to explore how to combine this framework with [2] to further improve the results. 6. CONCLUSIONS We have proposed a cascade of pruners using hierarchical poselets. These pruners can reduce the state space to allow the use of complex models. What distinguishes our work from [5] is that our pruners are learned using both large parts [1] Pedro F. Felzenszwalb and Daniel P. Huttenlocher, Pictorial structures for object recognition, IJCV, , 2 [2] Yi Yang and Deva Ramanan, Articulated pose estimation with flexible mixtures-of-parts, in CVPR, , 2, 4, 6 [3] Yang Wang, Duan Tran, and Zicheng Liao, Learning hierarchical poselets for human parsing, in CVPR, , 2, 3, 4, 5, 6 [4] Vittorio Ferrari, Manuel Marín-Jiménez, and Andrew Zisserman, Progressive search space reduction for human pose estimation, in CVPR, , 2, 3, 5 [5] Benjamin Sapp, Alexander Toshev, and Ben Taskar, Cascaded models for articulated pose estimation, in ECCV, , 2, 3, 5, 6 [6] Duan Tran and David Forsyth, Improved human parsing with a full relational model, in ECCV, , 5 [7] Mykhaylo Andriluka, Stefan Roth, and Bernt Schiele, Pictorial structures revisited: People detection and articulated pose estimation, in CVPR, , 2, 6 [8] Deva Ramanan, Learning to parse images of articulated bodies, in NIPS, , 2, 5, 6 [9] Xiaofeng Ren, Alexander Berg, and Jitendra Malik, Recovering human body configurations using pairwise constraints between parts, in ICCV, [10] Tai-Peng Tian and Stan Sclaroff, Fast globally optimal 2d human detection with loopy graph models, in CVPR, [11] Yang Wang and Greg Mori, Multiple tree models for occlusion and spatial constraints in human pose estimation, in ECCV, [12] Deva Ramanan and Cristian Sminchisescu, Training deformable models for localization, in CVPR, [13] Pawan Kumar, Andrew Zisserman, and Philip H. S. Torr, Efficient discriminative learning of parts-based models, in ICCV, [14] Vivek Kumar Singh, Ram Nevatia, and Chang Huang, Efficient inference with multiple heterogenous part detectors for human pose estimation, in ECCV, [15] Sam Johnson and Mark Everingham, Combining discriminative appearance and segmentation cues for articulated human pose estimation, in MLVMA, [16] Greg Mori, Xiaofeng Ren, Alyosha Efros, and Jitendra Malik, Recovering human body configuration: Combining segmentation and recognition, in CVPR, [17] Greg Mori, Guiding model search using segmentation, in ICCV, [18] Benjamin Sapp, Chris Jordan, and Ben Taskar, Adaptive pose priors for pictorial structures, in CVPR, [19] Praveen Srinivasan and Jianbo Shi, Bottom-up recognition and parsing of the human body, in CVPR, [20] Sam Johnson and Mark Everingham, Clustered pose and nonlinear appearance models for human pose estimation, in BMVC, [21] Min Sun and Silvio Savarese, Articulated part-base model for joint object detection and pose estimation, in ICCV, [22] Lubomir Bourdev, Subhransu Maji, Thomas Brox, and Jitendra Malik, Detecting people using mutually consistent poselet activations, in ECCV, [23] Lubomir Bourdev and Jitendra Malik, Poselets: Body part detectors training using 3d human pose annotations, in ICCV, [24] Paul Viola and Michael Jones, Rapid object detection using a boosted cascade of simple features, in CVPR,

Easy Minimax Estimation with Random Forests for Human Pose Estimation

Easy Minimax Estimation with Random Forests for Human Pose Estimation Easy Minimax Estimation with Random Forests for Human Pose Estimation P. Daphne Tsatsoulis and David Forsyth Department of Computer Science University of Illinois at Urbana-Champaign {tsatsou2, daf}@illinois.edu

More information

Poselet Conditioned Pictorial Structures

Poselet Conditioned Pictorial Structures Poselet Conditioned Pictorial Structures Leonid Pishchulin 1 Mykhaylo Andriluka 1 Peter Gehler 2 Bernt Schiele 1 1 Max Planck Institute for Informatics, Saarbrücken, Germany 2 Max Planck Institute for

More information

Improved Human Parsing with a Full Relational Model

Improved Human Parsing with a Full Relational Model Improved Human Parsing with a Full Relational Model Duan Tran and David Forsyth University of Illinois at Urbana-Champaign, USA {ddtran2,daf}@illinois.edu Abstract. We show quantitative evidence that a

More information

Strong Appearance and Expressive Spatial Models for Human Pose Estimation

Strong Appearance and Expressive Spatial Models for Human Pose Estimation 2013 IEEE International Conference on Computer Vision Strong Appearance and Expressive Spatial Models for Human Pose Estimation Leonid Pishchulin 1 Mykhaylo Andriluka 1 Peter Gehler 2 Bernt Schiele 1 1

More information

Articulated Pose Estimation using Discriminative Armlet Classifiers

Articulated Pose Estimation using Discriminative Armlet Classifiers Articulated Pose Estimation using Discriminative Armlet Classifiers Georgia Gkioxari 1, Pablo Arbeláez 1, Lubomir Bourdev 2 and Jitendra Malik 1 1 University of California, Berkeley - Berkeley, CA 94720

More information

Exploring the Spatial Hierarchy of Mixture Models for Human Pose Estimation

Exploring the Spatial Hierarchy of Mixture Models for Human Pose Estimation Exploring the Spatial Hierarchy of Mixture Models for Human Pose Estimation Yuandong Tian 1, C. Lawrence Zitnick 2, and Srinivasa G. Narasimhan 1 1 Carnegie Mellon University, 5000 Forbes Ave, Pittsburgh,

More information

Human Pose Estimation in images and videos

Human Pose Estimation in images and videos Human Pose Estimation in images and videos Andrew Zisserman Department of Engineering Science University of Oxford, UK http://www.robots.ox.ac.uk/~vgg/ Objective and motivation Determine human body pose

More information

Learning to parse images of articulated bodies

Learning to parse images of articulated bodies Learning to parse images of articulated bodies Deva Ramanan Toyota Technological Institute at Chicago Chicago, IL 60637 ramanan@tti-c.org Abstract We consider the machine vision task of pose estimation

More information

Cascaded Models for Articulated Pose Estimation

Cascaded Models for Articulated Pose Estimation Cascaded Models for Articulated Pose Estimation Benjamin Sapp, Alexander Toshev, and Ben Taskar University of Pennsylvania, Philadelphia, PA 19104 USA {bensapp,toshev,taskar}@cis.upenn.edu Abstract. We

More information

Combining PGMs and Discriminative Models for Upper Body Pose Detection

Combining PGMs and Discriminative Models for Upper Body Pose Detection Combining PGMs and Discriminative Models for Upper Body Pose Detection Gedas Bertasius May 30, 2014 1 Introduction In this project, I utilized probabilistic graphical models together with discriminative

More information

Boosted Multiple Deformable Trees for Parsing Human Poses

Boosted Multiple Deformable Trees for Parsing Human Poses Boosted Multiple Deformable Trees for Parsing Human Poses Yang Wang and Greg Mori School of Computing Science Simon Fraser University Burnaby, BC, Canada {ywang12,mori}@cs.sfu.ca Abstract. Tree-structured

More information

Clustered Pose and Nonlinear Appearance Models for Human Pose Estimation

Clustered Pose and Nonlinear Appearance Models for Human Pose Estimation JOHNSON, EVERINGHAM: CLUSTERED MODELS FOR HUMAN POSE ESTIMATION 1 Clustered Pose and Nonlinear Appearance Models for Human Pose Estimation Sam Johnson s.a.johnson04@leeds.ac.uk Mark Everingham m.everingham@leeds.ac.uk

More information

High Level Computer Vision

High Level Computer Vision High Level Computer Vision Part-Based Models for Object Class Recognition Part 2 Bernt Schiele - schiele@mpi-inf.mpg.de Mario Fritz - mfritz@mpi-inf.mpg.de http://www.d2.mpi-inf.mpg.de/cv Please Note No

More information

Combining Discriminative Appearance and Segmentation Cues for Articulated Human Pose Estimation

Combining Discriminative Appearance and Segmentation Cues for Articulated Human Pose Estimation Combining Discriminative Appearance and Segmentation Cues for Articulated Human Pose Estimation Sam Johnson and Mark Everingham School of Computing University of Leeds {mat4saj m.everingham}@leeds.ac.uk

More information

Estimating Human Pose in Images. Navraj Singh December 11, 2009

Estimating Human Pose in Images. Navraj Singh December 11, 2009 Estimating Human Pose in Images Navraj Singh December 11, 2009 Introduction This project attempts to improve the performance of an existing method of estimating the pose of humans in still images. Tasks

More information

Recognizing Human Actions from Still Images with Latent Poses

Recognizing Human Actions from Still Images with Latent Poses Recognizing Human Actions from Still Images with Latent Poses Weilong Yang, Yang Wang, and Greg Mori School of Computing Science Simon Fraser University Burnaby, BC, Canada wya16@sfu.ca, ywang12@cs.sfu.ca,

More information

Scale and Rotation Invariant Approach to Tracking Human Body Part Regions in Videos

Scale and Rotation Invariant Approach to Tracking Human Body Part Regions in Videos Scale and Rotation Invariant Approach to Tracking Human Body Part Regions in Videos Yihang Bo Institute of Automation, CAS & Boston College yihang.bo@gmail.com Hao Jiang Computer Science Department, Boston

More information

Part-Based Models for Object Class Recognition Part 2

Part-Based Models for Object Class Recognition Part 2 High Level Computer Vision Part-Based Models for Object Class Recognition Part 2 Bernt Schiele - schiele@mpi-inf.mpg.de Mario Fritz - mfritz@mpi-inf.mpg.de https://www.mpi-inf.mpg.de/hlcv Class of Object

More information

Part-Based Models for Object Class Recognition Part 2

Part-Based Models for Object Class Recognition Part 2 High Level Computer Vision Part-Based Models for Object Class Recognition Part 2 Bernt Schiele - schiele@mpi-inf.mpg.de Mario Fritz - mfritz@mpi-inf.mpg.de https://www.mpi-inf.mpg.de/hlcv Class of Object

More information

Using k-poselets for detecting people and localizing their keypoints

Using k-poselets for detecting people and localizing their keypoints Using k-poselets for detecting people and localizing their keypoints Georgia Gkioxari, Bharath Hariharan, Ross Girshick and itendra Malik University of California, Berkeley - Berkeley, CA 94720 {gkioxari,bharath2,rbg,malik}@berkeley.edu

More information

Detecting Actions, Poses, and Objects with Relational Phraselets

Detecting Actions, Poses, and Objects with Relational Phraselets Detecting Actions, Poses, and Objects with Relational Phraselets Chaitanya Desai and Deva Ramanan University of California at Irvine, Irvine CA, USA {desaic,dramanan}@ics.uci.edu Abstract. We present a

More information

Structured Models in. Dan Huttenlocher. June 2010

Structured Models in. Dan Huttenlocher. June 2010 Structured Models in Computer Vision i Dan Huttenlocher June 2010 Structured Models Problems where output variables are mutually dependent or constrained E.g., spatial or temporal relations Such dependencies

More information

End-to-End Learning of Deformable Mixture of Parts and Deep Convolutional Neural Networks for Human Pose Estimation

End-to-End Learning of Deformable Mixture of Parts and Deep Convolutional Neural Networks for Human Pose Estimation End-to-End Learning of Deformable Mixture of Parts and Deep Convolutional Neural Networks for Human Pose Estimation Wei Yang Wanli Ouyang Hongsheng Li Xiaogang Wang Department of Electronic Engineering,

More information

Combining Local Appearance and Holistic View: Dual-Source Deep Neural Networks for Human Pose Estimation

Combining Local Appearance and Holistic View: Dual-Source Deep Neural Networks for Human Pose Estimation Combining Local Appearance and Holistic View: Dual-Source Deep Neural Networks for Human Pose Estimation Xiaochuan Fan, Kang Zheng, Yuewei Lin, Song Wang Department of Computer Science & Engineering, University

More information

Fast Online Upper Body Pose Estimation from Video

Fast Online Upper Body Pose Estimation from Video CHANG et al.: FAST ONLINE UPPER BODY POSE ESTIMATION FROM VIDEO 1 Fast Online Upper Body Pose Estimation from Video Ming-Ching Chang 1,2 changm@ge.com Honggang Qi 1,3 hgqi@ucas.ac.cn Xin Wang 1 xwang26@albany.edu

More information

Articulated Pose Estimation by a Graphical Model with Image Dependent Pairwise Relations

Articulated Pose Estimation by a Graphical Model with Image Dependent Pairwise Relations Articulated Pose Estimation by a Graphical Model with Image Dependent Pairwise Relations Xianjie Chen University of California, Los Angeles Los Angeles, CA 90024 cxj@ucla.edu Alan Yuille University of

More information

MODEC: Multimodal Decomposable Models for Human Pose Estimation

MODEC: Multimodal Decomposable Models for Human Pose Estimation 13 IEEE Conference on Computer Vision and Pattern Recognition MODEC: Multimodal Decomposable Models for Human Pose Estimation Ben Sapp Google, Inc bensapp@google.com Ben Taskar University of Washington

More information

Multiple Pose Context Trees for estimating Human Pose in Object Context

Multiple Pose Context Trees for estimating Human Pose in Object Context Multiple Pose Context Trees for estimating Human Pose in Object Context Vivek Kumar Singh Furqan Muhammad Khan Ram Nevatia University of Southern California Los Angeles, CA {viveksin furqankh nevatia}@usc.edu

More information

Qualitative Pose Estimation by Discriminative Deformable Part Models

Qualitative Pose Estimation by Discriminative Deformable Part Models Qualitative Pose Estimation by Discriminative Deformable Part Models Hyungtae Lee, Vlad I. Morariu, and Larry S. Davis University of Maryland, College Park Abstract. We present a discriminative deformable

More information

Deformable Part Models

Deformable Part Models CS 1674: Intro to Computer Vision Deformable Part Models Prof. Adriana Kovashka University of Pittsburgh November 9, 2016 Today: Object category detection Window-based approaches: Last time: Viola-Jones

More information

Human Context: Modeling human-human interactions for monocular 3D pose estimation

Human Context: Modeling human-human interactions for monocular 3D pose estimation Human Context: Modeling human-human interactions for monocular 3D pose estimation Mykhaylo Andriluka Leonid Sigal Max Planck Institute for Informatics, Saarbrücken, Germany Disney Research, Pittsburgh,

More information

Human Pose Estimation using Body Parts Dependent Joint Regressors

Human Pose Estimation using Body Parts Dependent Joint Regressors 2013 IEEE Conference on Computer Vision and Pattern Recognition Human Pose Estimation using Body Parts Dependent Joint Regressors Matthias Dantone 1 Juergen Gall 2 Christian Leistner 3 Luc Van Gool 1 ETH

More information

Detecting and Parsing of Visual Objects: Humans and Animals. Alan Yuille (UCLA)

Detecting and Parsing of Visual Objects: Humans and Animals. Alan Yuille (UCLA) Detecting and Parsing of Visual Objects: Humans and Animals Alan Yuille (UCLA) Summary This talk describes recent work on detection and parsing visual objects. The methods represent objects in terms of

More information

Is 2D Information Enough For Viewpoint Estimation? Amir Ghodrati, Marco Pedersoli, Tinne Tuytelaars BMVC 2014

Is 2D Information Enough For Viewpoint Estimation? Amir Ghodrati, Marco Pedersoli, Tinne Tuytelaars BMVC 2014 Is 2D Information Enough For Viewpoint Estimation? Amir Ghodrati, Marco Pedersoli, Tinne Tuytelaars BMVC 2014 Problem Definition Viewpoint estimation: Given an image, predicting viewpoint for object of

More information

Category-level localization

Category-level localization Category-level localization Cordelia Schmid Recognition Classification Object present/absent in an image Often presence of a significant amount of background clutter Localization / Detection Localize object

More information

Learning and Recognizing Visual Object Categories Without First Detecting Features

Learning and Recognizing Visual Object Categories Without First Detecting Features Learning and Recognizing Visual Object Categories Without First Detecting Features Daniel Huttenlocher 2007 Joint work with D. Crandall and P. Felzenszwalb Object Category Recognition Generic classes rather

More information

Object Category Detection. Slides mostly from Derek Hoiem

Object Category Detection. Slides mostly from Derek Hoiem Object Category Detection Slides mostly from Derek Hoiem Today s class: Object Category Detection Overview of object category detection Statistical template matching with sliding window Part-based Models

More information

Human Pose Estimation using Global and Local Normalization. Ke Sun, Cuiling Lan, Junliang Xing, Wenjun Zeng, Dong Liu, Jingdong Wang

Human Pose Estimation using Global and Local Normalization. Ke Sun, Cuiling Lan, Junliang Xing, Wenjun Zeng, Dong Liu, Jingdong Wang Human Pose Estimation using Global and Local Normalization Ke Sun, Cuiling Lan, Junliang Xing, Wenjun Zeng, Dong Liu, Jingdong Wang Overview of the supplementary material In this supplementary material,

More information

Part-Based Models for Object Class Recognition Part 3

Part-Based Models for Object Class Recognition Part 3 High Level Computer Vision! Part-Based Models for Object Class Recognition Part 3 Bernt Schiele - schiele@mpi-inf.mpg.de Mario Fritz - mfritz@mpi-inf.mpg.de! http://www.d2.mpi-inf.mpg.de/cv ! State-of-the-Art

More information

Pose Machines: Articulated Pose Estimation via Inference Machines

Pose Machines: Articulated Pose Estimation via Inference Machines Pose Machines: Articulated Pose Estimation via Inference Machines Varun Ramakrishna, Daniel Munoz, Martial Hebert, James Andrew Bagnell, and Yaser Sheikh The Robotics Institute, Carnegie Mellon University,

More information

Better appearance models for pictorial structures

Better appearance models for pictorial structures EICHNER, FERRARI: BETTER APPEARANCE MODELS FOR PICTORIAL STRUCTURES 1 Better appearance models for pictorial structures Marcin Eichner eichner@vision.ee.ethz.ch Vittorio Ferrari ferrari@vision.ee.ethz.ch

More information

Joint Segmentation and Pose Tracking of Human in Natural Videos

Joint Segmentation and Pose Tracking of Human in Natural Videos 23 IEEE International Conference on Computer Vision Joint Segmentation and Pose Tracking of Human in Natural Videos Taegyu Lim,2 Seunghoon Hong 2 Bohyung Han 2 Joon Hee Han 2 DMC R&D Center, Samsung Electronics,

More information

Measure Locally, Reason Globally: Occlusion-sensitive Articulated Pose Estimation

Measure Locally, Reason Globally: Occlusion-sensitive Articulated Pose Estimation Measure Locally, Reason Globally: Occlusion-sensitive Articulated Pose Estimation Leonid Sigal Michael J. Black Department of Computer Science, Brown University, Providence, RI 02912 {ls,black}@cs.brown.edu

More information

Shape-based Pedestrian Parsing

Shape-based Pedestrian Parsing Shape-based Pedestrian Parsing Yihang Bo School of Computer and Information Technology Beijing Jiaotong University, China yihang.bo@gmail.com Charless C. Fowlkes Department of Computer Science University

More information

Object Recognition with Deformable Models

Object Recognition with Deformable Models Object Recognition with Deformable Models Pedro F. Felzenszwalb Department of Computer Science University of Chicago Joint work with: Dan Huttenlocher, Joshua Schwartz, David McAllester, Deva Ramanan.

More information

CS229 Final Project Report. A Multi-Task Feature Learning Approach to Human Detection. Tiffany Low

CS229 Final Project Report. A Multi-Task Feature Learning Approach to Human Detection. Tiffany Low CS229 Final Project Report A Multi-Task Feature Learning Approach to Human Detection Tiffany Low tlow@stanford.edu Abstract We focus on the task of human detection using unsupervised pre-trained neutral

More information

Global Pose Estimation Using Non-Tree Models

Global Pose Estimation Using Non-Tree Models Global Pose Estimation Using Non-Tree Models Hao Jiang and David R. Martin Computer Science Department, Boston College Chestnut Hill, MA 02467, USA hjiang@cs.bc.edu, dmartin@cs.bc.edu Abstract We propose

More information

Strike a Pose: Tracking People by Finding Stylized Poses

Strike a Pose: Tracking People by Finding Stylized Poses Strike a Pose: Tracking People by Finding Stylized Poses Deva Ramanan 1 and D. A. Forsyth 1 and Andrew Zisserman 2 1 University of California, Berkeley Berkeley, CA 94720 2 University of Oxford Oxford,

More information

Object Detection by 3D Aspectlets and Occlusion Reasoning

Object Detection by 3D Aspectlets and Occlusion Reasoning Object Detection by 3D Aspectlets and Occlusion Reasoning Yu Xiang University of Michigan Silvio Savarese Stanford University In the 4th International IEEE Workshop on 3D Representation and Recognition

More information

Efficient Detector Adaptation for Object Detection in a Video

Efficient Detector Adaptation for Object Detection in a Video 2013 IEEE Conference on Computer Vision and Pattern Recognition Efficient Detector Adaptation for Object Detection in a Video Pramod Sharma and Ram Nevatia Institute for Robotics and Intelligent Systems,

More information

Object Detection with Partial Occlusion Based on a Deformable Parts-Based Model

Object Detection with Partial Occlusion Based on a Deformable Parts-Based Model Object Detection with Partial Occlusion Based on a Deformable Parts-Based Model Johnson Hsieh (johnsonhsieh@gmail.com), Alexander Chia (alexchia@stanford.edu) Abstract -- Object occlusion presents a major

More information

Graphical Models for Computer Vision

Graphical Models for Computer Vision Graphical Models for Computer Vision Pedro F Felzenszwalb Brown University Joint work with Dan Huttenlocher, Joshua Schwartz, Ross Girshick, David McAllester, Deva Ramanan, Allie Shapiro, John Oberlin

More information

Parsing Human Motion with Stretchable Models

Parsing Human Motion with Stretchable Models Parsing Human Motion with Stretchable Models Benjamin Sapp David Weiss Ben Taskar University of Pennsylvania Philadelphia, PA 19104 {bensapp,djweiss,taskar}@cis.upenn.edu Abstract We address the problem

More information

Object Category Detection: Sliding Windows

Object Category Detection: Sliding Windows 04/10/12 Object Category Detection: Sliding Windows Computer Vision CS 543 / ECE 549 University of Illinois Derek Hoiem Today s class: Object Category Detection Overview of object category detection Statistical

More information

Test-time Adaptation for 3D Human Pose Estimation

Test-time Adaptation for 3D Human Pose Estimation Test-time Adaptation for 3D Human Pose Estimation Sikandar Amin,2, Philipp Müller 2, Andreas Bulling 2, and Mykhaylo Andriluka 2,3 Technische Universität München, Germany 2 Max Planck Institute for Informatics,

More information

Ensemble Convolutional Neural Networks for Pose Estimation

Ensemble Convolutional Neural Networks for Pose Estimation Ensemble Convolutional Neural Networks for Pose Estimation Yuki Kawana b, Norimichi Ukita a,, Jia-Bin Huang c, Ming-Hsuan Yang d a Toyota Technological Institute, Japan b Nara Institute of Science and

More information

Lecture 18: Human Motion Recognition

Lecture 18: Human Motion Recognition Lecture 18: Human Motion Recognition Professor Fei Fei Li Stanford Vision Lab 1 What we will learn today? Introduction Motion classification using template matching Motion classification i using spatio

More information

Estimating Human Pose with Flowing Puppets

Estimating Human Pose with Flowing Puppets 2013 IEEE International Conference on Computer Vision Estimating Human Pose with Flowing Puppets Silvia Zuffi 1,3 Javier Romero 2 Cordelia Schmid 4 Michael J. Black 2 1 Department of Computer Science,

More information

Multiple-Person Tracking by Detection

Multiple-Person Tracking by Detection http://excel.fit.vutbr.cz Multiple-Person Tracking by Detection Jakub Vojvoda* Abstract Detection and tracking of multiple person is challenging problem mainly due to complexity of scene and large intra-class

More information

Beyond Actions: Discriminative Models for Contextual Group Activities

Beyond Actions: Discriminative Models for Contextual Group Activities Beyond Actions: Discriminative Models for Contextual Group Activities Tian Lan School of Computing Science Simon Fraser University tla58@sfu.ca Weilong Yang School of Computing Science Simon Fraser University

More information

Segmentation. Bottom up Segmentation Semantic Segmentation

Segmentation. Bottom up Segmentation Semantic Segmentation Segmentation Bottom up Segmentation Semantic Segmentation Semantic Labeling of Street Scenes Ground Truth Labels 11 classes, almost all occur simultaneously, large changes in viewpoint, scale sky, road,

More information

A novel template matching method for human detection

A novel template matching method for human detection University of Wollongong Research Online Faculty of Informatics - Papers (Archive) Faculty of Engineering and Information Sciences 2009 A novel template matching method for human detection Duc Thanh Nguyen

More information

Object Detection with Discriminatively Trained Part Based Models

Object Detection with Discriminatively Trained Part Based Models Object Detection with Discriminatively Trained Part Based Models Pedro F. Felzenszwelb, Ross B. Girshick, David McAllester and Deva Ramanan Presented by Fabricio Santolin da Silva Kaustav Basu Some slides

More information

Modern Object Detection. Most slides from Ali Farhadi

Modern Object Detection. Most slides from Ali Farhadi Modern Object Detection Most slides from Ali Farhadi Comparison of Classifiers assuming x in {0 1} Learning Objective Training Inference Naïve Bayes maximize j i logp + logp ( x y ; θ ) ( y ; θ ) i ij

More information

A Discriminatively Trained, Multiscale, Deformable Part Model

A Discriminatively Trained, Multiscale, Deformable Part Model A Discriminatively Trained, Multiscale, Deformable Part Model by Pedro Felzenszwalb, David McAllester, and Deva Ramanan CS381V Visual Recognition - Paper Presentation Slide credit: Duan Tran Slide credit:

More information

CS 664 Flexible Templates. Daniel Huttenlocher

CS 664 Flexible Templates. Daniel Huttenlocher CS 664 Flexible Templates Daniel Huttenlocher Flexible Template Matching Pictorial structures Parts connected by springs and appearance models for each part Used for human bodies, faces Fischler&Elschlager,

More information

Finding people in repeated shots of the same scene

Finding people in repeated shots of the same scene 1 Finding people in repeated shots of the same scene Josef Sivic 1 C. Lawrence Zitnick Richard Szeliski 1 University of Oxford Microsoft Research Abstract The goal of this work is to find all occurrences

More information

Human Upper Body Pose Estimation in Static Images

Human Upper Body Pose Estimation in Static Images 1. Research Team Human Upper Body Pose Estimation in Static Images Project Leader: Graduate Students: Prof. Isaac Cohen, Computer Science Mun Wai Lee 2. Statement of Project Goals This goal of this project

More information

Body Parts Dependent Joint Regressors for Human Pose Estimation in Still Images

Body Parts Dependent Joint Regressors for Human Pose Estimation in Still Images Research Collection Journal Article Body Parts Dependent Joint Regressors for Human Pose Estimation in Still Images Author(s): Dantone, Matthias; Gall, Juergen; Leistner, Christian; Van Gool, Luc Publication

More information

Segmenting Objects in Weakly Labeled Videos

Segmenting Objects in Weakly Labeled Videos Segmenting Objects in Weakly Labeled Videos Mrigank Rochan, Shafin Rahman, Neil D.B. Bruce, Yang Wang Department of Computer Science University of Manitoba Winnipeg, Canada {mrochan, shafin12, bruce, ywang}@cs.umanitoba.ca

More information

DPM Configurations for Human Interaction Detection

DPM Configurations for Human Interaction Detection DPM Configurations for Human Interaction Detection Coert van Gemeren Ronald Poppe Remco C. Veltkamp Interaction Technology Group, Department of Information and Computing Sciences, Utrecht University, The

More information

Integrating Grammar and Segmentation for Human Pose Estimation

Integrating Grammar and Segmentation for Human Pose Estimation 2013 IEEE Conference on Computer Vision and Pattern Recognition Integrating Grammar and Segmentation for Human Pose Estimation Brandon Rothrock 1, Seyoung Park 1 and Song-Chun Zhu 1,2 1 Department of Computer

More information

CS 231A Computer Vision (Fall 2011) Problem Set 4

CS 231A Computer Vision (Fall 2011) Problem Set 4 CS 231A Computer Vision (Fall 2011) Problem Set 4 Due: Nov. 30 th, 2011 (9:30am) 1 Part-based models for Object Recognition (50 points) One approach to object recognition is to use a deformable part-based

More information

Learning and Inferring Depth from Monocular Images. Jiyan Pan April 1, 2009

Learning and Inferring Depth from Monocular Images. Jiyan Pan April 1, 2009 Learning and Inferring Depth from Monocular Images Jiyan Pan April 1, 2009 Traditional ways of inferring depth Binocular disparity Structure from motion Defocus Given a single monocular image, how to infer

More information

Local cues and global constraints in image understanding

Local cues and global constraints in image understanding Local cues and global constraints in image understanding Olga Barinova Lomonosov Moscow State University *Many slides adopted from the courses of Anton Konushin Image understanding «To see means to know

More information

Find that! Visual Object Detection Primer

Find that! Visual Object Detection Primer Find that! Visual Object Detection Primer SkTech/MIT Innovation Workshop August 16, 2012 Dr. Tomasz Malisiewicz tomasz@csail.mit.edu Find that! Your Goals...imagine one such system that drives information

More information

Iterative Action and Pose Recognition using Global-and-Pose Features and Action-specific Models

Iterative Action and Pose Recognition using Global-and-Pose Features and Action-specific Models Iterative Action and Pose Recognition using Global-and-Pose Features and Action-specific Models Norimichi Ukita Nara Institute of Science and Technology ukita@ieee.org Abstract This paper proposes an iterative

More information

A Hierarchical Compositional System for Rapid Object Detection

A Hierarchical Compositional System for Rapid Object Detection A Hierarchical Compositional System for Rapid Object Detection Long Zhu and Alan Yuille Department of Statistics University of California at Los Angeles Los Angeles, CA 90095 {lzhu,yuille}@stat.ucla.edu

More information

Learning Realistic Human Actions from Movies

Learning Realistic Human Actions from Movies Learning Realistic Human Actions from Movies Ivan Laptev*, Marcin Marszałek**, Cordelia Schmid**, Benjamin Rozenfeld*** INRIA Rennes, France ** INRIA Grenoble, France *** Bar-Ilan University, Israel Presented

More information

Image Retrieval with Structured Object Queries Using Latent Ranking SVM

Image Retrieval with Structured Object Queries Using Latent Ranking SVM Image Retrieval with Structured Object Queries Using Latent Ranking SVM Tian Lan 1, Weilong Yang 1, Yang Wang 2, and Greg Mori 1 Simon Fraser University 1 University of Manitoba 2 {tla58,wya16,mori}@cs.sfu.ca

More information

Multi-view Body Part Recognition with Random Forests

Multi-view Body Part Recognition with Random Forests KAZEMI, BURENIUS, AZIZPOUR, SULLIVAN: MULTI-VIEW BODY PART RECOGNITION 1 Multi-view Body Part Recognition with Random Forests Vahid Kazemi vahidk@csc.kth.se Magnus Burenius burenius@csc.kth.se Hossein

More information

Tri-modal Human Body Segmentation

Tri-modal Human Body Segmentation Tri-modal Human Body Segmentation Master of Science Thesis Cristina Palmero Cantariño Advisor: Sergio Escalera Guerrero February 6, 2014 Outline 1 Introduction 2 Tri-modal dataset 3 Proposed baseline 4

More information

Thin-Slicing Network: A Deep Structured Model for Pose Estimation in Videos

Thin-Slicing Network: A Deep Structured Model for Pose Estimation in Videos Thin-Slicing Network: A Deep Structured Model for Pose Estimation in Videos Jie Song Limin Wang2 AIT Lab, ETH Zurich Luc Van Gool2 2 Otmar Hilliges Computer Vision Lab, ETH Zurich Abstract Deep ConvNets

More information

Human Pose Estimation Using Consistent Max-Covering

Human Pose Estimation Using Consistent Max-Covering Human Pose Estimation Using Consistent Max-Covering Hao Jiang Computer Science Department, Boston College Chestnut Hill, MA 2467, USA hjiang@cs.bc.edu Abstract We propose a novel consistent max-covering

More information

and we can apply the recursive definition of the cost-to-go function to get

and we can apply the recursive definition of the cost-to-go function to get Section 20.2 Parsing People in Images 602 and we can apply the recursive definition of the cost-to-go function to get argmax f X 1,...,X n chain (X 1,...,X n ) = argmax ( ) f X 1 chain (X 1 )+fcost-to-go

More information

Occlusion Patterns for Object Class Detection

Occlusion Patterns for Object Class Detection 23 IEEE Conference on Computer Vision and Pattern Recognition Occlusion Patterns for Object Class Detection Bojan Pepik Michael Stark,2 Peter Gehler 3 Bernt Schiele Max Planck Institute for Informatics,

More information

Large-Scale Traffic Sign Recognition based on Local Features and Color Segmentation

Large-Scale Traffic Sign Recognition based on Local Features and Color Segmentation Large-Scale Traffic Sign Recognition based on Local Features and Color Segmentation M. Blauth, E. Kraft, F. Hirschenberger, M. Böhm Fraunhofer Institute for Industrial Mathematics, Fraunhofer-Platz 1,

More information

Action Recognition in Cluttered Dynamic Scenes using Pose-Specific Part Models

Action Recognition in Cluttered Dynamic Scenes using Pose-Specific Part Models Action Recognition in Cluttered Dynamic Scenes using Pose-Specific Part Models Vivek Kumar Singh University of Southern California Los Angeles, CA, USA viveksin@usc.edu Ram Nevatia University of Southern

More information

Window based detectors

Window based detectors Window based detectors CS 554 Computer Vision Pinar Duygulu Bilkent University (Source: James Hays, Brown) Today Window-based generic object detection basic pipeline boosting classifiers face detection

More information

Human detection using local shape and nonredundant

Human detection using local shape and nonredundant University of Wollongong Research Online Faculty of Informatics - Papers (Archive) Faculty of Engineering and Information Sciences 2010 Human detection using local shape and nonredundant binary patterns

More information

People detection in complex scene using a cascade of Boosted classifiers based on Haar-like-features

People detection in complex scene using a cascade of Boosted classifiers based on Haar-like-features People detection in complex scene using a cascade of Boosted classifiers based on Haar-like-features M. Siala 1, N. Khlifa 1, F. Bremond 2, K. Hamrouni 1 1. Research Unit in Signal Processing, Image Processing

More information

Discriminative Figure-Centric Models for Joint Action Localization and Recognition

Discriminative Figure-Centric Models for Joint Action Localization and Recognition Discriminative Figure-Centric Models for Joint Action Localization and Recognition Tian Lan School of Computing Science Simon Fraser University tla58@sfu.ca Yang Wang Dept. of Computer Science UIUC yangwang@uiuc.edu

More information

https://en.wikipedia.org/wiki/the_dress Recap: Viola-Jones sliding window detector Fast detection through two mechanisms Quickly eliminate unlikely windows Use features that are fast to compute Viola

More information

Modeling 3D viewpoint for part-based object recognition of rigid objects

Modeling 3D viewpoint for part-based object recognition of rigid objects Modeling 3D viewpoint for part-based object recognition of rigid objects Joshua Schwartz Department of Computer Science Cornell University jdvs@cs.cornell.edu Abstract Part-based object models based on

More information

Learning People Detectors for Tracking in Crowded Scenes

Learning People Detectors for Tracking in Crowded Scenes Learning People Detectors for Tracking in Crowded Scenes Siyu Tang1 Mykhaylo Andriluka1 Konrad Schindler3 Stefan Roth2 1 Anton Milan2 Bernt Schiele1 Max Planck Institute for Informatics, Saarbru cken,

More information

Object detection using non-redundant local Binary Patterns

Object detection using non-redundant local Binary Patterns University of Wollongong Research Online Faculty of Informatics - Papers (Archive) Faculty of Engineering and Information Sciences 2010 Object detection using non-redundant local Binary Patterns Duc Thanh

More information

on learned visual embedding patrick pérez Allegro Workshop Inria Rhônes-Alpes 22 July 2015

on learned visual embedding patrick pérez Allegro Workshop Inria Rhônes-Alpes 22 July 2015 on learned visual embedding patrick pérez Allegro Workshop Inria Rhônes-Alpes 22 July 2015 Vector visual representation Fixed-size image representation High-dim (100 100,000) Generic, unsupervised: BoW,

More information

Histogram of Oriented Gradients (HOG) for Object Detection

Histogram of Oriented Gradients (HOG) for Object Detection Histogram of Oriented Gradients (HOG) for Object Detection Navneet DALAL Joint work with Bill TRIGGS and Cordelia SCHMID Goal & Challenges Goal: Detect and localise people in images and videos n Wide variety

More information

Separating Objects and Clutter in Indoor Scenes

Separating Objects and Clutter in Indoor Scenes Separating Objects and Clutter in Indoor Scenes Salman H. Khan School of Computer Science & Software Engineering, The University of Western Australia Co-authors: Xuming He, Mohammed Bennamoun, Ferdous

More information

arxiv: v3 [cs.cv] 9 Jun 2015

arxiv: v3 [cs.cv] 9 Jun 2015 Efficient Object Localization Using Convolutional Networks Jonathan Tompson, Ross Goroshin, Arjun Jain, Yann LeCun, Christoph Bregler New York University tompson/goroshin/ajain/lecun/bregler@cims.nyu.edu

More information