Coupling-and-Decoupling: A Hierarchical Model for Occlusion-Free Car Detection

Size: px
Start display at page:

Download "Coupling-and-Decoupling: A Hierarchical Model for Occlusion-Free Car Detection"

Transcription

1 Coupling-and-Decoupling: A Hierarchical Model for Occlusion-Free Car Detection Bo Li 1,2,3, Tianfu Wu 2,3, Wenze Hu 3,4 and Mingtao Pei 1 1 Beijing Lab of Intelligent Information, School of Computer Science and Technology, Beijing Institute of Technology, Beijing , P.R.China 2 BUPT-Seesoft Joint Lab of Visual Computing and Image Communication, Beijing University of Posts and Telecommunications (BUPT), Beijing , P.R.China 3 Lotus Hill Research Institute, EZhou, P.R.China 4 Department of Statistics, University of California, Los Angeles {boli.lhi, tfwu.lhi, wzhu.lhi}@gmail.com, peimt@bit.edu.cn Abstract. Handling occlusions in object detection is a long-standing problem. This paper addresses the problem of X-to-X-occlusion-free object detection (e.g. car-to-car occlusions in our experiment) by utilizing an intuitive coupling-and-decoupling strategy. In the coupling stage, we model the pair of occluding X s (e.g. car pairs) directly to account for the statistically strong co-occurrence (i.e. coupling). Then, we learn a hierarchical And-Or directed acyclic graph (AOG) model under the latent structural SVM (LSSVM) framework. The learned AOG consists of, from the top to bottom, (i) a root Or-node representing different compositions of occluding X pairs, (ii) a set of And-nodes each of which represents a specific composition of occluding X pairs, (iii) another set of And-nodes representing single X s decomposed from occluding X pairs, and (iv) a set of terminal-nodes which represent the appearance templates for the X pairs, single X s and latent parts of the single X s, respectively. The part appearance templates can also be shared among different single X s. In detection, a dynamic programming (DP) algorithm is used and as a natural consequence we decouple the two single X s from the X-to-X occluding pairs. In experiments, we test our method on roadside cars which are collected from real traffic video surveillance environment by ourselves. We compare our model with the state-of-the-art deformable part-based model (DPM) and obtain better detection performance. 1 Introduction In the literature of object detection, handling occlusions is very challenging and remains a long-standing problem. The two main reasons are (i) The gap between training and testing. When training an object detector, unoccluded object instances are often collected and used purposely. In testing, however, occlusions are inevitable in real scenarios. As a result, the detection performance will go down significantly as occlusions become severe. And (ii) The lack of common occlusion models. Generally and statistically speaking, it is very difficult to capture and predict occlusions because they can be treated as being uniformly distributed

2 2 B. Li, T. Wu, W. Hu and M. Pei in the wildest situation. To some extend, that explains, in turn, why the gap between training and testing exists. To address the occlusion problem, among others, hierarchical modeling (e.g. deformable part-based models [5]) has been widely used and shows performance improvement, and a 2-layer model is often adopted for modeling single objects, which can tackle small occlusions implicitly. Fig. 1. Some examples of roadside cars. There are different types of car-to-car occlusions which challenge the state-of-the-art detectors trained for single cars. In this paper, we distinguish between two types of occlusions: the X-to-X and X-to-Y occlusions, where X and Y represent different object categories (e.g. X represents car and Y person) respectively, and then present a couplingand-decoupling method for X-to-X occlusion-free object detection without modeling occlusions explicitly. As the running examples, we use roadside cars which are often parked along the curb, leading to the X-to-X occlusions. Occlusion-free roadside car detection can facilitate many important applications in computer vision and intelligent transportation, such as parking violation capturing, license plate detection and parking management. Figure 1 shows some examples of carto-car occlusions in real traffic video surveillance environment. In the sequel, we concretely use car instead of X to present the formulation (but notice that the proposed method is not limited to cars). Our method consists of two stages as follows. (i) The coupling stage in modeling and learning. Instead of training a single object detector, we learn hierarchical And-Or directed and acyclic graph (AOG) models for the car-to-car occluding pairs directly to account for the statistically strong coupling. The learned AOG consists of, from the top to bottom, (i) a root Or-node representing different compositions of occluding car-to-car pairs, (ii) a set of And-nodes each of which represents a specific composition of occluding car pairs, (iii) another set of And-nodes representing single cars decomposed from occluding car pairs, and (iv) a set of terminal-nodes which represent the appearance templates for the car pairs, single cars and latent parts of the single cars, respectively. The part appearance templates can also be shared among different single cars. We adopt Histogram of Oriented Gradient (HOG) [2] as the appearance feature as done in DPM [5]. Figure 3 shows the learned AOG for car-to-car pairs (where for clarity only a portion is drawn). We formulate the learning of AOG under the latent structural SVM (LSSVM) framework [13, 14, 16]. In the training

3 A Hierarchical Model for Occlusion-Free Car Detection 3 dataset, bounding boxes of car pairs and corresponding two single cars are annotated, and the parts of single cars are treated as latent variables. (ii) The decoupling stage in detection. Our AOG model is directed and acyclic and we can utilize the DP algorithm in inference. For detected car pairs, the back-traced bounding boxes for the two single cars are obtained, i.e., decoupled from the car pair. Since the locations and sizes of bounding boxes of the single cars are annotated when jointly training the AOG model, the back-traced ones are the optimal solutions for the two single cars statistics of overlapping cars, subset: test 0.8 detection rate of DPM model and proposed model, subset: test our hierarchical car detector DPM car detector proportion detection rate overlap ratio Occlusion: Occlusion: overlap ratio Occlusion: Occlusion: Occlusion: Fig. 2. Top-left: The population ratios in the testing set of roadside cars used in this paper. Bottom: Some examples of cropped car-to-car occluding pairs. The occlusion ratio is measured for the back car in the car pairs. Top-right: The plots of detection rates v.s. occlusion ratios, where blue dashed curve is for the state-of-the-art DPM [5] and red curve is for the proposed method. See text for details. To illustrate the necessity and the advantage of the proposed method in this paper, in Fig. 2, the left figure shows the population ratios of car-to-car pairs with different degrees of occlusions in the testing dataset collected by ourselves from the real traffic video surveillance environment. Some cropped image examples are shown in the bottom. The right figure shows the detection rates against the occlusion ratio for the proposed method (the red curve) and the state-of-the-art DPM [5] (the blue dashed curve). We can observe that, (i) The population ratio of car pairs with occlusions being equal or greater than 0.2 is greater than 0.5 (i.e. occlusions become a statistically major factor). (ii) At the same time, the detection performance of DPM dropped significantly when occlusions go beyond 0.2, while our method can obtain much better performance. (iii) The detection performance of our method goes up significantly when occlusions are greater than This is because that with those severe occlusions,

4 4 B. Li, T. Wu, W. Hu and M. Pei even if DPM could recall the two single cars, their bounding boxes overlap larger than the threshold normally used (e.g. 0.7), and then the one with lower score will be excluded by non-maximum suppressing (NMS) (see the DPM detection results in Fig. 5). Our method can, however, detect those cars correctly by decoupling them from the detected car pairs. More results and final performance comparison are shown in Fig. 5 and Fig.4 respectively. In the literature of computer vision, car detection for traffic monitoring systems are addressed mainly in single unoccluded situations, such as car type classification [12, 8], multiple-view car detection [9, 7], or shadow removal from suspicious car regions in images [10]. [1] proposed a method to detect and track multiple cars simultaneously, but they did not address the occlusion problem. Fig. 3. Our AOG Model. First-layer: illustration of car pair And-nodes and their corresponding appearance features. Second-layer: illustration of single car And-nodes and their corresponding appearance features. Third-layer: illustration of car parts Terminalnodes and their corresponding appearance and deformation features. Parts are shared. For clarity, we just show the parts of two single cars. 2 The Model 2.1 The AOG In this section, we specify the AOG hierarchical model used in this paper which is a directed and acyclic graph facilitating the DP algorithm in detection. The learning of AOG will be given in Sec. 4. By following the framework in [17], our AOG embeds the occluding car pair detection grammars which are embodied by defining three types of nodes:

5 A Hierarchical Model for Occlusion-Free Car Detection 5 (i) The root Or-node O represents compositional alternatives of the occluding car pairs (e.g., car pairs from different viewpoints or with different degrees of occlusions). The Or-node O has a branching variable, denoted by ω(o), indicating which child And-node is selected, and ω(o) will be inferred onthe-fly in detection. (ii) A set of And-nodes V And. There are two types of And-nodes: car pairs and single cars. Each car pair And-node represents the decomposition of a specific type of occluding car pair into two single cars (e.g. a frontal view car pair with the back car being occluded by 30% roughly), and each single car Andnode represents the decomposition of a single car into a small number of parts. (iii) A set of Terminal-nodes V T. First of all, the And-nodes defined above themselves can terminate directly, creating terminal nodes, when the resolution is low (relative to their own decomposed parts). Secondly, each part is represented by a terminal-node linking to image data. In the model, each terminalnode t V T has its own location, denoted by l t, which will be also inferred on-the-fly in detection. The location for placing an And-node is the same as that for the terminal-node directly terminated from it. In the AOG, terminal-nodes link the object detection grammars to image data by evaluating the appearance features, And-nodes take into account the geometric deformations between their child nodes, and Or-nodes select the best solution (i.e. the one with maximal score) among their child nodes. So, the scoring function of the AOG consists of two terms: appearance (i.e. data term) and deformation (i.e. relation term). Formally, an AOG is specified by a 5-tuple, G = (O, V And, V T, Θ app, Θ def ) (1) where Θ app are the parameters for the appearance scoring function when placing terminal-nodes in images, and Θ def the parameters for the deformation cost of a placed terminal-node with respect to its anchor location. They will be learned by LSSVM jointly. Part-sharing in the AOG. For the child single car And-nodes decomposed from car pair And-nodes, some of them are often with the same type (such as sided view or frontal view cars) but different occlusions. So, they can share part appearance, but might have different deformation models. By sharing-parts, it will supply more data in training the part appearance parameters, and also reduce run-time in detection. 2.2 The scoring function of an AOG Let Λ be the image lattice and I Λ an image defined on Λ. In detection, we need to search over scales to detect objects with different sizes. In practice, a feature pyramid of I Λ is generated, denoted by H (e.g. the HOG feature pyramid used

6 6 B. Li, T. Wu, W. Hu and M. Pei in the DPM [5] and our method). When placing an AOG in I Λ at a location u Λ, we have, (i) The scoring function for evaluating an Or-node O at u is defined by, Score(O, u) = max Score(A, u) (2) A ch(o) where ch(o) V And is the set of child And-nodes of the Or-node O. We can assign the branching variable ω(o) = arg max A ch(o) Score(O, u). (ii) The scoring function for computing an And-node A with respect to a placed Or-node O at u is defined by, Score(A, u O, u) =< θ app t A, Φ app (H, A, u) > + Score(c A, u) (3) c ch(a) where the first term is the appearance score for the terminal-node t A terminated from And-node A directly, θ app t A Θ app is the corresponding appearance parameters, Φ app (H, A, u) is the features extracted from the feature pyramid, and ch(a) V And V T is the set of child nodes of A. (iii) The scoring function for computing an And-node A 1 with respect to a placed And-node A at u is defined by, Score(A 1 A, u) = max ( < v Λ θapp t A1, Φ app (H, A 1, v) > < θ def A, 1 A Φdef A 1 A (v, u) > + Score(t A 1, v)) (4) t ch(a 1) where θ def Θ def is the corresponding deformation parameter for node (such as A 1 ) with respect to node (such as A), Φ def (v, u) is the deformation feature which we adopt the same quadratic function as used in DPM [5] and we have Φ def (v, u) = [dx2, dx, dy 2, dy] where (dx, dy) is the displacement between v and u. The best placed location of node A 1 is retrieved by taking arg max v Λ Score(A 1 A, u). (iv) For computing a part terminal-node t with respect to a placed parent Andnode A at u, the scoring function is defined by, Score(t A, u) = max v Λ (< θapp t, Φ app (H, t, v) > < θ def t A, Φdef t A (v, u) >) (5) where in practice, we often place node t at twice the spatial resolution relative to node A to capture more detail information. 3 The DP algorithm for Detection In detection, we first find all the locations in the image pyramid where the scores of the placed AOGs are higher than the estimated threshold τ. For example, at the original resolution, we have {u; Score(O, u) > τ, u Λ}. Then, we will

7 A Hierarchical Model for Occlusion-Free Car Detection 7 utilize the NMS to get final detection results. Since the AOG is directed and acyclic, the AOG scoring function is evaluated in two phases by utilizing the DP algorithm: (i) one bottom-up phase to compute all the appearance scoring maps for terminal-nodes, as well as their transformed maps for different parent nodes which are computed by using the efficient generalized distance transform [6], and then (ii) one top-down phase to retrieve the configurations (i.e., locations of car pair, single cars and parts) for all the locations whose scores are greater than the threshold τ, followed by a post-processing NMS step in practice. We omit the obvious details of the DP algorithm here due to the limited space, which are referred to [5]. By the top-down back-tracing, we can obtain the decoupled single cars from detected occluding car pairs. Notice that we may have two inferred locations for a single car which is shared by two adjacent car pairs if the single car appears in the middle of a line of multiple occluding cars. Then, we use the location as the final detection result for the single car which is decoupled from the detected car pair with higher score. 4 Learning AOG by Latent Structural SVM In this section, we formulate the learning of the AOG under the latent structural SVM (LSSVM) framework [13, 14, 16], which has been widely used in the literature of object detection and machine learning. Training Data. We collect roadside cars from the real traffic video surveillance environment. We annotate the bounding boxes for both occluding car pairs and the corresponding two single cars. When labeling occluded single cars, we annotate their whole bounding boxes. Notice that some cars may be used twice in two adjacent car pairs when they appear in the middle of a line of multiple occluding cars. Those duplicated cars can be treated as bootstrapped ones in learning appearance parameters for single car and parts. 4.1 Latent variables in the AOG Given the training data specified above, for the AOG defined in Sec. 2.1, we have the latent variables as follows. The branches of the root Or-node, i.e., the mixture components of occluding car pairs. Based on the labeled bounding boxes, we initialize them using k-means clustering on the concatenated features (k = 3 clusters in our experiment): the aspect ratios of the three annotated bounding boxes and the displacements between the centers of the two single cars relative to that of the car pair (normalized by the size of the car pair bounding box). The aspect ratios of single cars can roughly indicate viewpoints, the displacement have clue on the configuration of car pair and the aspect ratio of car pair can reflect the degree of occlusions. In training, we also incorporate left-right flipped ones as done in [4]. So, we have 6 car pair models in total. We train the initial AOG (consisting of

8 8 B. Li, T. Wu, W. Hu and M. Pei the root Or-node, the six car pair And-nodes, the twelve single car And-nodes, and the corresponding terminal-nodes for the And-nodes) under LSSVM framework by treating the locations and sizes of car pairs and single cars as hidden variables anchored at the annotated bounding boxes. At each step of re-labeling the positive examples (i.e. assigning latent variables) in learning, we force the the assignment of car pair terminal-nodes to overlap more than 0.7 with the ground-truth, and more than 0.8 for single car terminal nodes. The part configuration for single cars and part-sharing. After the initial AOG is trained, we initialize the part configurations for the single cars based on the learned single car template, similar to the greedy pursuit method used in DPM [5]. We used 8 parts of rectangular shape and with equal sizes for each single car. For the part sharing, we use the similar method as done in [11], resulting in 30 part terminal-nodes in total. 4.2 Learning by LSSVM Denote the set of positive training images by D + = {(I 1, y 1, z 1 ),, (I n, y n, z n )}, where y i = 1 and z i = (ω i, B i, P i ) consisting of (i) The Or-node branching variable ω i (i.e. the mixture component index); (ii) The labeled three bounding boxes B i for the car pair and the two single cars respectively; and (iii) The bounding boxes P i for parts of single cars. z i s are treated as latent variables during learning with different initialization: ω i is initialized by the k-mean clustering stated above, B i by the annotated bounding boxes, and P i by the greedy pursuit and part-sharing strategy stated above. Let D = {(I n+1, y n+1 ),, (I N, y N )} be a set of negative training images (i.e. images without cars appearing) where y i = 1. We first train the initial AOG using z i = (ω i, B i ), and then initialize P i and learn the full AOG using z i = (ω i, B i, P i ). Both are done under the LSSVM framework. Given z, the scoring function is a linear function, Score(I, y, z; Θ) =< Θ, Φ(I, y, z) > (6) where Θ = (Θ app, Θ def ) and Φ(I, y, z) = (Φ app (I, y, z), Φ def (y, z)) specified in Eqn. 3, Eqn. 4 and Eqn. 5. Under the LSSVM framework, we learn Θ by solving the following surrogate loss function [14, 16], 1 min Θ 2 Θ 2 2+ C N N i=1 [max y,z (Score(I i, y, z; Θ) + (y i, y, z)) max z (Score(I i, y i, z))] (7) where the loss function (y i, y, z) = 1 if y i = y, 0 otherwise, and C is the tradeoff parameter balancing the first regularization term and the surrogate loss

9 A Hierarchical Model for Occlusion-Free Car Detection 9 term. The objective function is non-convex, and the concave-convex procedure (CCCP) [15, 14, 16] is used to get a local optimum. Firstly, Eqn.7 can be re-written as, min Θ 1 2 Θ C N max N (Score(I i, y, z; Θ) + (y i, y, z))] + y,z i=1 }{{} f(θ), convex function C N max N (Score(I i, y i, z)) z i=1 }{{} g(θ), concave function (8) Then, at step t, based on the current solution Θ t, The CCCP solves the problem with the two steps as follows. (i) Bounding g(θ) from the upper (since it is concave), i.e., finding hyperplane p t such that, g(θ) g(θ t ) + (Θ Θ t ) p t To do that, we first get the best latent variable assignment for each example by solving zi = arg max z i Score(I, y i, z i ) using the DP algorithm. Then, p t is constructed by, p t = C N Φ(I i, y i, zi ) N i=1 (ii) Updating the solution Θ t+1 = arg min Θ (f(θ) + Θ p t ). The step leads to a standard structural SVM by using different off-the-shelf solver such as the cutting plane method. The details are referred to [13, 16]. Figure 3 shows a portion of our learned AOG model. The first layer corresponding to car pair, the second layer corresponding to single car, and the third layer corresponding to car parts. Beside each node in the AOG, we visualize the learned appearance and deformation templates. 5 Experiments To evaluate our proposed method, we collected 482 car images from street view scenes and annotated the bounding boxes for both car pairs and single cars. In detail, we obtained 1380 car pairs, 2760 occluded single cars and 702 unoccluded single cars. We randomly select 200 images for training, and use the rest for testing. For the negative set, we use the training negative images from PASCAL VOC 2007 database [3]. We also follow the VOC protocol for reporting results [3]. A putative bounding box is considered correct if the intersection of its bounding box with the ground-truth bounding box is greater than 50% of their

10 10 B. Li, T. Wu, W. Hu and M. Pei union. Multiple detections for the same ground truth are penalized. We compute Precision-Recall (PR) curves and score the average precision (AP) across our test set. In experiments, we compared our AOG with the two baseline DPMs: Baseline 1. DPM trained by using occluded single cars in the training set; Baseline 2. DPM trained by using all the single cars in the training set. Fig. 4 shows the PR curves of the three methods, where the proposed method outperforms the two baseline DPMs significantly (by 9.5% and 12.3% respectively). 1 class: car, subset: test precision Baseline 1: AP = Baseline 2: AP = Hierarchical AOG Model: AP = recall Fig. 4. Precision-Recall curves for our model and baseline methods. Figure 5 shows detection results of both DPM car model (Baseline 1 is used since it is better than Baseline 2 according to the PR curves) and our AOG model. Figure 6 shows some examples of layered detection results of Our AOG model. On the top, we show the detection results of car pair model in our AOG. On the bottom, detection results of single cars are shown by using the full AOG model. Here, we can see that our model can lead to fast coarse-to-fine detection, which we will further investigate in our on-going work. 6 Conclusion In this paper, we proposed a hierarchical And-Or directed acyclic graph (AOG) model to address the problem of X-to-X-occlusion-free object detection. The model is a grammar model. It consists of (i) a root Or-node representing a mixture of different types of occluding X pairs, (ii) a set of And-nodes representing different types of occluding X pairs, (iii) another set of And-nodes representing different types of occluding single X s decomposed from X pairs, and (iv) a set

11 A Hierarchical Model for Occlusion-Free Car Detection 11 DPM DPM DPM AOG AOG AOG DPM DPM DPM AOG AOG AOG Fig. 5. Comparison of DPM car model and our hierarchical AOG model. The first row and the third row show the detection results (blue bounding boxes) of DPM car detector, the second and the fourth row show the detection results (red bounding boxes) of proposed AOG model. Best viewed in color. Car Pair Car Pair Car Pair Single Car Single Car Single Car Fig. 6. Layered detections of our AOG model. Top: detection results of car pair module by coupling in the first layer. Bottom: detection results of single car module by decoupling in the second layer.

12 12 B. Li, T. Wu, W. Hu and M. Pei of terminal-nodes representing the appearance templates for the X pairs, single X s and latent parts of the single X s. The part appearance templates can also be shared among different And-nodes of single X s. This model is learned by the latent structural SVM (LSSVM). DP algorithm is used for inference. Our model is a general model, though we only use cars as running examples in this paper, it can be used for other objects potentially. Acknowledgement. We thank the three anonymous reviewers for their helpful comments. This work is supported by China 973 Program under Grant No. 2012CB316300, Natural Science Foundation of China under Grant No References 1. Choi, J.Y., Sung, K.S., Yang, Y.K.: Multiple Vehicles Detection and Tracking based on Scale-Invariant Feature Transform. In: ITSC (2007) Dalal, N., Triggs, B.: Histograms of Oriented Gradients for Human Detection. In: CVPR (2005) Everingham, M., Van Gool, L., Williams, C.K.I., Winn, J., Zisserman, A.: The PASCAL Visual Object Classes Challenge 2007 (VOC2007) Results. ( (2007) 4. Felzenszwalb, P.F., Girshick, R.B., McAllester, D.: Discriminatively Trained Deformable Part Models, Release 4. ( pff/latentrelease4/) (2010) 5. Felzenszwalb, P.F., Girshick, R.B., McAllester, D.A., Ramanan, D.: Object Detection with Discriminatively Trained Part-Based Models. TPAMI 32 (2010) Felzenszwalb, P.F., Huttenlocher, D.P.: Distance Transforms of Sampled Functions. Technical report , Cornell University CIS (2004) 7. Gupte, S., Masoud, O., Martin, R.F.K., Papanikolopoulos, N.P.: Detection and Classification of Vehicles. TITS 3 (2002) Lai, A.H.S., Fung, G.S.K., Yung, N.H.C.: Vehicle Type Classification from Visualbased Dimension Estimation. In: ITSC (2001) Leotta, M.J., Mundy, J.L.: Vehicle Surveillance with a Generic, Adaptive, 3D Vehicle Model. TPAMI 33 (2011) Liu, X., Dai, B., He, H.: Real-Time On-Road Vehicle Detection Combining Specific Shadow Segmentation and SVM Classification. In: ICDMA (2011) Ott, P., Everingham, M.: Shared Parts for Deformable Part-based Models. In: CVPR (2011) Petrovic, V.S., Cootes, T.F.: Analysis of Features for Rigid Structure Vehicle Type Recognition. In: BMVC (2004) Tsochantaridis, I., Joachims, T., Hofmann, T., Altun, Y.: Large Margin Methods for Structured and Interdependent Output Variables. JMLR 6 (2005) Yu, C.N.J., Joachims, T.: Learning Structural SVMs with Latent Variables. In: ICML (2009) Yuille, A.L., Rangarajan, A.: The Concave-Convex Procedure (CCCP). In: NIPS (2001) Zhu, L., Chen, Y., Yuille, A.L., Freeman, W.T.: Latent Hierarchical Structural Learning for Object Detection. In: CVPR (2010) Zhu, S.C., Mumford, D.: A Stochastic Grammar of Images. FTCGV 2 (2006)

Development in Object Detection. Junyuan Lin May 4th

Development in Object Detection. Junyuan Lin May 4th Development in Object Detection Junyuan Lin May 4th Line of Research [1] N. Dalal and B. Triggs. Histograms of oriented gradients for human detection, CVPR 2005. HOG Feature template [2] P. Felzenszwalb,

More information

Modeling Occlusion by Discriminative AND-OR Structures

Modeling Occlusion by Discriminative AND-OR Structures 2013 IEEE International Conference on Computer Vision Modeling Occlusion by Discriminative AND-OR Structures Bo Li, Wenze Hu, Tianfu Wu and Song-Chun Zhu Beijing Lab of Intelligent Information Technology,

More information

CAR is one of the most frequently seen object category in

CAR is one of the most frequently seen object category in FOR REVIEW: IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE 1 Learning And-Or Model to Represent Context and Occlusion for Car Detection and Viewpoint Estimation Tianfu Wu, Bo Li and Song-Chun

More information

Integrating Context and Occlusion for Car Detection by Hierarchical And-Or Model

Integrating Context and Occlusion for Car Detection by Hierarchical And-Or Model Integrating Context and Occlusion for Car Detection by Hierarchical And-Or Model Bo Li 1,2, Tianfu Wu 2,, and Song-Chun Zhu 2 1 Beijing Lab of Intelligent Information Technology, Beijing Institute of Technology

More information

Hierarchical Learning for Object Detection. Long Zhu, Yuanhao Chen, William Freeman, Alan Yuille, Antonio Torralba MIT and UCLA, 2010

Hierarchical Learning for Object Detection. Long Zhu, Yuanhao Chen, William Freeman, Alan Yuille, Antonio Torralba MIT and UCLA, 2010 Hierarchical Learning for Object Detection Long Zhu, Yuanhao Chen, William Freeman, Alan Yuille, Antonio Torralba MIT and UCLA, 2010 Background I: Our prior work Our work for the Pascal Challenge is based

More information

Modeling 3D viewpoint for part-based object recognition of rigid objects

Modeling 3D viewpoint for part-based object recognition of rigid objects Modeling 3D viewpoint for part-based object recognition of rigid objects Joshua Schwartz Department of Computer Science Cornell University jdvs@cs.cornell.edu Abstract Part-based object models based on

More information

Object Detection by 3D Aspectlets and Occlusion Reasoning

Object Detection by 3D Aspectlets and Occlusion Reasoning Object Detection by 3D Aspectlets and Occlusion Reasoning Yu Xiang University of Michigan Silvio Savarese Stanford University In the 4th International IEEE Workshop on 3D Representation and Recognition

More information

Object Detection with Partial Occlusion Based on a Deformable Parts-Based Model

Object Detection with Partial Occlusion Based on a Deformable Parts-Based Model Object Detection with Partial Occlusion Based on a Deformable Parts-Based Model Johnson Hsieh (johnsonhsieh@gmail.com), Alexander Chia (alexchia@stanford.edu) Abstract -- Object occlusion presents a major

More information

Pedestrian Detection Using Structured SVM

Pedestrian Detection Using Structured SVM Pedestrian Detection Using Structured SVM Wonhui Kim Stanford University Department of Electrical Engineering wonhui@stanford.edu Seungmin Lee Stanford University Department of Electrical Engineering smlee729@stanford.edu.

More information

Deformable Part Models

Deformable Part Models CS 1674: Intro to Computer Vision Deformable Part Models Prof. Adriana Kovashka University of Pittsburgh November 9, 2016 Today: Object category detection Window-based approaches: Last time: Viola-Jones

More information

Part-Based Models for Object Class Recognition Part 3

Part-Based Models for Object Class Recognition Part 3 High Level Computer Vision! Part-Based Models for Object Class Recognition Part 3 Bernt Schiele - schiele@mpi-inf.mpg.de Mario Fritz - mfritz@mpi-inf.mpg.de! http://www.d2.mpi-inf.mpg.de/cv ! State-of-the-Art

More information

Category-level localization

Category-level localization Category-level localization Cordelia Schmid Recognition Classification Object present/absent in an image Often presence of a significant amount of background clutter Localization / Detection Localize object

More information

ONLINE object tracking is an innate capability in human and

ONLINE object tracking is an innate capability in human and ACCEPTED BY IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, DOI:1.119/TPAMI.216.2644963 1 Online Object Tracking, Learning and Parsing with And-Or Graphs Tianfu Wu, Yang Lu and Song-Chun

More information

Deformable Part Models with Individual Part Scaling

Deformable Part Models with Individual Part Scaling DUBOUT, FLEURET: DEFORMABLE PART MODELS WITH INDIVIDUAL PART SCALING 1 Deformable Part Models with Individual Part Scaling Charles Dubout charles.dubout@idiap.ch François Fleuret francois.fleuret@idiap.ch

More information

Using the Deformable Part Model with Autoencoded Feature Descriptors for Object Detection

Using the Deformable Part Model with Autoencoded Feature Descriptors for Object Detection Using the Deformable Part Model with Autoencoded Feature Descriptors for Object Detection Hyunghoon Cho and David Wu December 10, 2010 1 Introduction Given its performance in recent years' PASCAL Visual

More information

Object Detection with Discriminatively Trained Part Based Models

Object Detection with Discriminatively Trained Part Based Models Object Detection with Discriminatively Trained Part Based Models Pedro F. Felzenszwelb, Ross B. Girshick, David McAllester and Deva Ramanan Presented by Fabricio Santolin da Silva Kaustav Basu Some slides

More information

Fast, Accurate Detection of 100,000 Object Classes on a Single Machine

Fast, Accurate Detection of 100,000 Object Classes on a Single Machine Fast, Accurate Detection of 100,000 Object Classes on a Single Machine Thomas Dean etal. Google, Mountain View, CA CVPR 2013 best paper award Presented by: Zhenhua Wang 2013.12.10 Outline Background This

More information

Part-based Visual Tracking with Online Latent Structural Learning: Supplementary Material

Part-based Visual Tracking with Online Latent Structural Learning: Supplementary Material Part-based Visual Tracking with Online Latent Structural Learning: Supplementary Material Rui Yao, Qinfeng Shi 2, Chunhua Shen 2, Yanning Zhang, Anton van den Hengel 2 School of Computer Science, Northwestern

More information

A Study of Vehicle Detector Generalization on U.S. Highway

A Study of Vehicle Detector Generalization on U.S. Highway 26 IEEE 9th International Conference on Intelligent Transportation Systems (ITSC) Windsor Oceanico Hotel, Rio de Janeiro, Brazil, November -4, 26 A Study of Vehicle Generalization on U.S. Highway Rakesh

More information

Structured Models in. Dan Huttenlocher. June 2010

Structured Models in. Dan Huttenlocher. June 2010 Structured Models in Computer Vision i Dan Huttenlocher June 2010 Structured Models Problems where output variables are mutually dependent or constrained E.g., spatial or temporal relations Such dependencies

More information

Object Recognition with Deformable Models

Object Recognition with Deformable Models Object Recognition with Deformable Models Pedro F. Felzenszwalb Department of Computer Science University of Chicago Joint work with: Dan Huttenlocher, Joshua Schwartz, David McAllester, Deva Ramanan.

More information

Combining ROI-base and Superpixel Segmentation for Pedestrian Detection Ji Ma1,2, a, Jingjiao Li1, Zhenni Li1 and Li Ma2

Combining ROI-base and Superpixel Segmentation for Pedestrian Detection Ji Ma1,2, a, Jingjiao Li1, Zhenni Li1 and Li Ma2 6th International Conference on Machinery, Materials, Environment, Biotechnology and Computer (MMEBC 2016) Combining ROI-base and Superpixel Segmentation for Pedestrian Detection Ji Ma1,2, a, Jingjiao

More information

Structured output regression for detection with partial truncation

Structured output regression for detection with partial truncation Structured output regression for detection with partial truncation Andrea Vedaldi Andrew Zisserman Department of Engineering University of Oxford Oxford, UK {vedaldi,az}@robots.ox.ac.uk Abstract We develop

More information

Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks

Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks Shaoqing Ren, Kaiming He, Ross Girshick, and Jian Sun Presented by Tushar Bansal Objective 1. Get bounding box for all objects

More information

Supplementary Material: Unconstrained Salient Object Detection via Proposal Subset Optimization

Supplementary Material: Unconstrained Salient Object Detection via Proposal Subset Optimization Supplementary Material: Unconstrained Salient Object via Proposal Subset Optimization 1. Proof of the Submodularity According to Eqns. 10-12 in our paper, the objective function of the proposed optimization

More information

Online Object Tracking, Learning and Parsing with And-Or Graphs

Online Object Tracking, Learning and Parsing with And-Or Graphs ARXIV VERSION 1 Online Object Tracking, Learning and Parsing with And-Or Graphs Tianfu Wu, Yang Lu and Song-Chun Zhu arxiv:159.67v3 [cs.cv] 11 May 16 Abstract This paper presents a method, called AOGTracker,

More information

PEDESTRIAN DETECTION IN CROWDED SCENES VIA SCALE AND OCCLUSION ANALYSIS

PEDESTRIAN DETECTION IN CROWDED SCENES VIA SCALE AND OCCLUSION ANALYSIS PEDESTRIAN DETECTION IN CROWDED SCENES VIA SCALE AND OCCLUSION ANALYSIS Lu Wang Lisheng Xu Ming-Hsuan Yang Northeastern University, China University of California at Merced, USA ABSTRACT Despite significant

More information

Modern Object Detection. Most slides from Ali Farhadi

Modern Object Detection. Most slides from Ali Farhadi Modern Object Detection Most slides from Ali Farhadi Comparison of Classifiers assuming x in {0 1} Learning Objective Training Inference Naïve Bayes maximize j i logp + logp ( x y ; θ ) ( y ; θ ) i ij

More information

A Discriminatively Trained, Multiscale, Deformable Part Model

A Discriminatively Trained, Multiscale, Deformable Part Model A Discriminatively Trained, Multiscale, Deformable Part Model by Pedro Felzenszwalb, David McAllester, and Deva Ramanan CS381V Visual Recognition - Paper Presentation Slide credit: Duan Tran Slide credit:

More information

Part-based models. Lecture 10

Part-based models. Lecture 10 Part-based models Lecture 10 Overview Representation Location Appearance Generative interpretation Learning Distance transforms Other approaches using parts Felzenszwalb, Girshick, McAllester, Ramanan

More information

Occlusion Patterns for Object Class Detection

Occlusion Patterns for Object Class Detection Occlusion Patterns for Object Class Detection Bojan Pepik1 Michael Stark1,2 Peter Gehler3 Bernt Schiele1 Max Planck Institute for Informatics, 2Stanford University, 3Max Planck Institute for Intelligent

More information

DPM Score Regressor for Detecting Occluded Humans from Depth Images

DPM Score Regressor for Detecting Occluded Humans from Depth Images DPM Score Regressor for Detecting Occluded Humans from Depth Images Tsuyoshi Usami, Hiroshi Fukui, Yuji Yamauchi, Takayoshi Yamashita and Hironobu Fujiyoshi Email: usami915@vision.cs.chubu.ac.jp Email:

More information

Detection III: Analyzing and Debugging Detection Methods

Detection III: Analyzing and Debugging Detection Methods CS 1699: Intro to Computer Vision Detection III: Analyzing and Debugging Detection Methods Prof. Adriana Kovashka University of Pittsburgh November 17, 2015 Today Review: Deformable part models How can

More information

Recognizing Human Actions from Still Images with Latent Poses

Recognizing Human Actions from Still Images with Latent Poses Recognizing Human Actions from Still Images with Latent Poses Weilong Yang, Yang Wang, and Greg Mori School of Computing Science Simon Fraser University Burnaby, BC, Canada wya16@sfu.ca, ywang12@cs.sfu.ca,

More information

Big and Tall: Large Margin Learning with High Order Losses

Big and Tall: Large Margin Learning with High Order Losses Big and Tall: Large Margin Learning with High Order Losses Daniel Tarlow University of Toronto dtarlow@cs.toronto.edu Richard Zemel University of Toronto zemel@cs.toronto.edu Abstract Graphical models

More information

The Pennsylvania State University. The Graduate School. College of Engineering ONLINE LIVESTREAM CAMERA CALIBRATION FROM CROWD SCENE VIDEOS

The Pennsylvania State University. The Graduate School. College of Engineering ONLINE LIVESTREAM CAMERA CALIBRATION FROM CROWD SCENE VIDEOS The Pennsylvania State University The Graduate School College of Engineering ONLINE LIVESTREAM CAMERA CALIBRATION FROM CROWD SCENE VIDEOS A Thesis in Computer Science and Engineering by Anindita Bandyopadhyay

More information

Selective Search for Object Recognition

Selective Search for Object Recognition Selective Search for Object Recognition Uijlings et al. Schuyler Smith Overview Introduction Object Recognition Selective Search Similarity Metrics Results Object Recognition Kitten Goal: Problem: Where

More information

Definition, Detection, and Evaluation of Meeting Events in Airport Surveillance Videos

Definition, Detection, and Evaluation of Meeting Events in Airport Surveillance Videos Definition, Detection, and Evaluation of Meeting Events in Airport Surveillance Videos Sung Chun Lee, Chang Huang, and Ram Nevatia University of Southern California, Los Angeles, CA 90089, USA sungchun@usc.edu,

More information

Colorado School of Mines. Computer Vision. Professor William Hoff Dept of Electrical Engineering &Computer Science.

Colorado School of Mines. Computer Vision. Professor William Hoff Dept of Electrical Engineering &Computer Science. Professor William Hoff Dept of Electrical Engineering &Computer Science http://inside.mines.edu/~whoff/ 1 People Detection Some material for these slides comes from www.cs.cornell.edu/courses/cs4670/2012fa/lectures/lec32_object_recognition.ppt

More information

Deformable Part Models

Deformable Part Models Deformable Part Models References: Felzenszwalb, Girshick, McAllester and Ramanan, Object Detec@on with Discrimina@vely Trained Part Based Models, PAMI 2010 Code available at hkp://www.cs.berkeley.edu/~rbg/latent/

More information

Large-Scale Traffic Sign Recognition based on Local Features and Color Segmentation

Large-Scale Traffic Sign Recognition based on Local Features and Color Segmentation Large-Scale Traffic Sign Recognition based on Local Features and Color Segmentation M. Blauth, E. Kraft, F. Hirschenberger, M. Böhm Fraunhofer Institute for Industrial Mathematics, Fraunhofer-Platz 1,

More information

A Hierarchical Compositional System for Rapid Object Detection

A Hierarchical Compositional System for Rapid Object Detection A Hierarchical Compositional System for Rapid Object Detection Long Zhu and Alan Yuille Department of Statistics University of California at Los Angeles Los Angeles, CA 90095 {lzhu,yuille}@stat.ucla.edu

More information

Pedestrian Detection and Tracking in Images and Videos

Pedestrian Detection and Tracking in Images and Videos Pedestrian Detection and Tracking in Images and Videos Azar Fazel Stanford University azarf@stanford.edu Viet Vo Stanford University vtvo@stanford.edu Abstract The increase in population density and accessibility

More information

Find that! Visual Object Detection Primer

Find that! Visual Object Detection Primer Find that! Visual Object Detection Primer SkTech/MIT Innovation Workshop August 16, 2012 Dr. Tomasz Malisiewicz tomasz@csail.mit.edu Find that! Your Goals...imagine one such system that drives information

More information

Is 2D Information Enough For Viewpoint Estimation? Amir Ghodrati, Marco Pedersoli, Tinne Tuytelaars BMVC 2014

Is 2D Information Enough For Viewpoint Estimation? Amir Ghodrati, Marco Pedersoli, Tinne Tuytelaars BMVC 2014 Is 2D Information Enough For Viewpoint Estimation? Amir Ghodrati, Marco Pedersoli, Tinne Tuytelaars BMVC 2014 Problem Definition Viewpoint estimation: Given an image, predicting viewpoint for object of

More information

Spatial Localization and Detection. Lecture 8-1

Spatial Localization and Detection. Lecture 8-1 Lecture 8: Spatial Localization and Detection Lecture 8-1 Administrative - Project Proposals were due on Saturday Homework 2 due Friday 2/5 Homework 1 grades out this week Midterm will be in-class on Wednesday

More information

Learning From Weakly Supervised Data by The Expectation Loss SVM (e-svm) algorithm

Learning From Weakly Supervised Data by The Expectation Loss SVM (e-svm) algorithm Learning From Weakly Supervised Data by The Expectation Loss SVM (e-svm) algorithm Jun Zhu Department of Statistics University of California, Los Angeles jzh@ucla.edu Junhua Mao Department of Statistics

More information

The Caltech-UCSD Birds Dataset

The Caltech-UCSD Birds Dataset The Caltech-UCSD Birds-200-2011 Dataset Catherine Wah 1, Steve Branson 1, Peter Welinder 2, Pietro Perona 2, Serge Belongie 1 1 University of California, San Diego 2 California Institute of Technology

More information

Lecture 15: Detecting Objects by Parts

Lecture 15: Detecting Objects by Parts Lecture 15: Detecting Objects by Parts David R. Morales, Austin O. Narcomey, Minh-An Quinn, Guilherme Reis, Omar Solis Department of Computer Science Stanford University Stanford, CA 94305 {mrlsdvd, aon2,

More information

Part 5: Structured Support Vector Machines

Part 5: Structured Support Vector Machines Part 5: Structured Support Vector Machines Sebastian Nowozin and Christoph H. Lampert Providence, 21st June 2012 1 / 34 Problem (Loss-Minimizing Parameter Learning) Let d(x, y) be the (unknown) true data

More information

Object Detection on Self-Driving Cars in China. Lingyun Li

Object Detection on Self-Driving Cars in China. Lingyun Li Object Detection on Self-Driving Cars in China Lingyun Li Introduction Motivation: Perception is the key of self-driving cars Data set: 10000 images with annotation 2000 images without annotation (not

More information

Part based models for recognition. Kristen Grauman

Part based models for recognition. Kristen Grauman Part based models for recognition Kristen Grauman UT Austin Limitations of window-based models Not all objects are box-shaped Assuming specific 2d view of object Local components themselves do not necessarily

More information

Learning to Localize Objects with Structured Output Regression

Learning to Localize Objects with Structured Output Regression Learning to Localize Objects with Structured Output Regression Matthew Blaschko and Christopher Lampert ECCV 2008 Best Student Paper Award Presentation by Jaeyong Sung and Yiting Xie 1 Object Localization

More information

Segmenting Objects in Weakly Labeled Videos

Segmenting Objects in Weakly Labeled Videos Segmenting Objects in Weakly Labeled Videos Mrigank Rochan, Shafin Rahman, Neil D.B. Bruce, Yang Wang Department of Computer Science University of Manitoba Winnipeg, Canada {mrochan, shafin12, bruce, ywang}@cs.umanitoba.ca

More information

Video understanding using part based object detection models

Video understanding using part based object detection models Video understanding using part based object detection models Vignesh Ramanathan Stanford University Stanford, CA-94305 vigneshr@stanford.edu Kevin Tang (mentor) Stanford University Stanford, CA-94305 kdtang@cs.stanford.edu

More information

Detecting Objects using Deformation Dictionaries

Detecting Objects using Deformation Dictionaries Detecting Objects using Deformation Dictionaries Bharath Hariharan UC Berkeley bharath2@eecs.berkeley.edu C. Lawrence Zitnick Microsoft Research larryz@microsoft.com Piotr Dollár Microsoft Research pdollar@microsoft.com

More information

3D Semantic Parsing of Large-Scale Indoor Spaces Supplementary Material

3D Semantic Parsing of Large-Scale Indoor Spaces Supplementary Material 3D Semantic Parsing of Large-Scale Indoor Spaces Supplementar Material Iro Armeni 1 Ozan Sener 1,2 Amir R. Zamir 1 Helen Jiang 1 Ioannis Brilakis 3 Martin Fischer 1 Silvio Savarese 1 1 Stanford Universit

More information

Using a deformation field model for localizing faces and facial points under weak supervision

Using a deformation field model for localizing faces and facial points under weak supervision Using a deformation field model for localizing faces and facial points under weak supervision Marco Pedersoli Tinne Tuytelaars Luc Van Gool KU Leuven, ESAT/PSI - iminds ETH Zürich, CVL/D-ITET firstname.lastname@esat.kuleuven.be

More information

A Segmentation-aware Object Detection Model with Occlusion Handling

A Segmentation-aware Object Detection Model with Occlusion Handling A Segmentation-aware Object Detection Model with Occlusion Handling Tianshi Gao 1 Benjamin Packer 2 Daphne Koller 2 1 Department of Electrical Engineering, Stanford University 2 Department of Computer

More information

A novel template matching method for human detection

A novel template matching method for human detection University of Wollongong Research Online Faculty of Informatics - Papers (Archive) Faculty of Engineering and Information Sciences 2009 A novel template matching method for human detection Duc Thanh Nguyen

More information

Detecting and Parsing of Visual Objects: Humans and Animals. Alan Yuille (UCLA)

Detecting and Parsing of Visual Objects: Humans and Animals. Alan Yuille (UCLA) Detecting and Parsing of Visual Objects: Humans and Animals Alan Yuille (UCLA) Summary This talk describes recent work on detection and parsing visual objects. The methods represent objects in terms of

More information

Object Recognition II

Object Recognition II Object Recognition II Linda Shapiro EE/CSE 576 with CNN slides from Ross Girshick 1 Outline Object detection the task, evaluation, datasets Convolutional Neural Networks (CNNs) overview and history Region-based

More information

Category vs. instance recognition

Category vs. instance recognition Category vs. instance recognition Category: Find all the people Find all the buildings Often within a single image Often sliding window Instance: Is this face James? Find this specific famous building

More information

Learning and Recognizing Visual Object Categories Without First Detecting Features

Learning and Recognizing Visual Object Categories Without First Detecting Features Learning and Recognizing Visual Object Categories Without First Detecting Features Daniel Huttenlocher 2007 Joint work with D. Crandall and P. Felzenszwalb Object Category Recognition Generic classes rather

More information

An Object Detection Algorithm based on Deformable Part Models with Bing Features Chunwei Li1, a and Youjun Bu1, b

An Object Detection Algorithm based on Deformable Part Models with Bing Features Chunwei Li1, a and Youjun Bu1, b 5th International Conference on Advanced Materials and Computer Science (ICAMCS 2016) An Object Detection Algorithm based on Deformable Part Models with Bing Features Chunwei Li1, a and Youjun Bu1, b 1

More information

Learning to Localize Objects with Structured Output Regression

Learning to Localize Objects with Structured Output Regression Learning to Localize Objects with Structured Output Regression Matthew B. Blaschko and Christoph H. Lampert Max Planck Institute for Biological Cybernetics 72076 Tübingen, Germany {blaschko,chl}@tuebingen.mpg.de

More information

Deformable Part Models Revisited: A Performance Evaluation for Object Category Pose Estimation

Deformable Part Models Revisited: A Performance Evaluation for Object Category Pose Estimation Deformable Part Models Revisited: A Performance Evaluation for Object Category Pose Estimation Roberto J. Lo pez-sastre Tinne Tuytelaars2 Silvio Savarese3 GRAM, Dept. Signal Theory and Communications,

More information

A Discriminatively Trained, Multiscale, Deformable Part Model

A Discriminatively Trained, Multiscale, Deformable Part Model A Discriminatively Trained, Multiscale, Deformable Part Model Pedro Felzenszwalb University of Chicago pff@cs.uchicago.edu David McAllester Toyota Technological Institute at Chicago mcallester@tti-c.org

More information

Object Detection by 3D Aspectlets and Occlusion Reasoning

Object Detection by 3D Aspectlets and Occlusion Reasoning Object Detection by 3D Aspectlets and Occlusion Reasoning Yu Xiang University of Michigan yuxiang@umich.edu Silvio Savarese Stanford University ssilvio@stanford.edu Abstract We propose a novel framework

More information

Multiple-Person Tracking by Detection

Multiple-Person Tracking by Detection http://excel.fit.vutbr.cz Multiple-Person Tracking by Detection Jakub Vojvoda* Abstract Detection and tracking of multiple person is challenging problem mainly due to complexity of scene and large intra-class

More information

https://en.wikipedia.org/wiki/the_dress Recap: Viola-Jones sliding window detector Fast detection through two mechanisms Quickly eliminate unlikely windows Use features that are fast to compute Viola

More information

Detector of Facial Landmarks Learned by the Structured Output SVM

Detector of Facial Landmarks Learned by the Structured Output SVM Detector of Facial Landmarks Learned by the Structured Output SVM Michal Uřičář, Vojtěch Franc and Václav Hlaváč Department of Cybernetics, Faculty Electrical Engineering Czech Technical University in

More information

Separating Objects and Clutter in Indoor Scenes

Separating Objects and Clutter in Indoor Scenes Separating Objects and Clutter in Indoor Scenes Salman H. Khan School of Computer Science & Software Engineering, The University of Western Australia Co-authors: Xuming He, Mohammed Bennamoun, Ferdous

More information

Detecting Actions, Poses, and Objects with Relational Phraselets

Detecting Actions, Poses, and Objects with Relational Phraselets Detecting Actions, Poses, and Objects with Relational Phraselets Chaitanya Desai and Deva Ramanan University of California at Irvine, Irvine CA, USA {desaic,dramanan}@ics.uci.edu Abstract. We present a

More information

Object detection using Region Proposals (RCNN) Ernest Cheung COMP Presentation

Object detection using Region Proposals (RCNN) Ernest Cheung COMP Presentation Object detection using Region Proposals (RCNN) Ernest Cheung COMP790-125 Presentation 1 2 Problem to solve Object detection Input: Image Output: Bounding box of the object 3 Object detection using CNN

More information

Deep Tracking: Biologically Inspired Tracking with Deep Convolutional Networks

Deep Tracking: Biologically Inspired Tracking with Deep Convolutional Networks Deep Tracking: Biologically Inspired Tracking with Deep Convolutional Networks Si Chen The George Washington University sichen@gwmail.gwu.edu Meera Hahn Emory University mhahn7@emory.edu Mentor: Afshin

More information

Self-Paced Learning for Semisupervised Image Classification

Self-Paced Learning for Semisupervised Image Classification Self-Paced Learning for Semisupervised Image Classification Kevin Miller Stanford University Palo Alto, CA kjmiller@stanford.edu Abstract In this project, we apply three variants of self-paced learning

More information

DPM Configurations for Human Interaction Detection

DPM Configurations for Human Interaction Detection DPM Configurations for Human Interaction Detection Coert van Gemeren Ronald Poppe Remco C. Veltkamp Interaction Technology Group, Department of Information and Computing Sciences, Utrecht University, The

More information

Parameter Sensitive Detectors

Parameter Sensitive Detectors Boston University OpenBU Computer Science http://open.bu.edu CAS: Computer Science: Technical Reports 2007 Parameter Sensitive Detectors Yuan, Quan Boston University Computer Science Department https://hdl.handle.net/244/680

More information

CAP 6412 Advanced Computer Vision

CAP 6412 Advanced Computer Vision CAP 6412 Advanced Computer Vision http://www.cs.ucf.edu/~bgong/cap6412.html Boqing Gong April 21st, 2016 Today Administrivia Free parameters in an approach, model, or algorithm? Egocentric videos by Aisha

More information

Object Category Detection: Sliding Windows

Object Category Detection: Sliding Windows 03/18/10 Object Category Detection: Sliding Windows Computer Vision CS 543 / ECE 549 University of Illinois Derek Hoiem Goal: Detect all instances of objects Influential Works in Detection Sung-Poggio

More information

HiRF: Hierarchical Random Field for Collective Activity Recognition in Videos

HiRF: Hierarchical Random Field for Collective Activity Recognition in Videos HiRF: Hierarchical Random Field for Collective Activity Recognition in Videos Mohamed R. Amer, Peng Lei, Sinisa Todorovic Oregon State University School of Electrical Engineering and Computer Science {amerm,

More information

18 October, 2013 MVA ENS Cachan. Lecture 6: Introduction to graphical models Iasonas Kokkinos

18 October, 2013 MVA ENS Cachan. Lecture 6: Introduction to graphical models Iasonas Kokkinos Machine Learning for Computer Vision 1 18 October, 2013 MVA ENS Cachan Lecture 6: Introduction to graphical models Iasonas Kokkinos Iasonas.kokkinos@ecp.fr Center for Visual Computing Ecole Centrale Paris

More information

Part-Based Models for Object Class Recognition Part 2

Part-Based Models for Object Class Recognition Part 2 High Level Computer Vision Part-Based Models for Object Class Recognition Part 2 Bernt Schiele - schiele@mpi-inf.mpg.de Mario Fritz - mfritz@mpi-inf.mpg.de https://www.mpi-inf.mpg.de/hlcv Class of Object

More information

Part-Based Models for Object Class Recognition Part 2

Part-Based Models for Object Class Recognition Part 2 High Level Computer Vision Part-Based Models for Object Class Recognition Part 2 Bernt Schiele - schiele@mpi-inf.mpg.de Mario Fritz - mfritz@mpi-inf.mpg.de https://www.mpi-inf.mpg.de/hlcv Class of Object

More information

Segmentation. Bottom up Segmentation Semantic Segmentation

Segmentation. Bottom up Segmentation Semantic Segmentation Segmentation Bottom up Segmentation Semantic Segmentation Semantic Labeling of Street Scenes Ground Truth Labels 11 classes, almost all occur simultaneously, large changes in viewpoint, scale sky, road,

More information

Content-Based Image Recovery

Content-Based Image Recovery Content-Based Image Recovery Hong-Yu Zhou and Jianxin Wu National Key Laboratory for Novel Software Technology Nanjing University, China zhouhy@lamda.nju.edu.cn wujx2001@nju.edu.cn Abstract. We propose

More information

Video annotation based on adaptive annular spatial partition scheme

Video annotation based on adaptive annular spatial partition scheme Video annotation based on adaptive annular spatial partition scheme Guiguang Ding a), Lu Zhang, and Xiaoxu Li Key Laboratory for Information System Security, Ministry of Education, Tsinghua National Laboratory

More information

Occlusion Patterns for Object Class Detection

Occlusion Patterns for Object Class Detection 23 IEEE Conference on Computer Vision and Pattern Recognition Occlusion Patterns for Object Class Detection Bojan Pepik Michael Stark,2 Peter Gehler 3 Bernt Schiele Max Planck Institute for Informatics,

More information

Rich feature hierarchies for accurate object detection and semantic segmentation

Rich feature hierarchies for accurate object detection and semantic segmentation Rich feature hierarchies for accurate object detection and semantic segmentation BY; ROSS GIRSHICK, JEFF DONAHUE, TREVOR DARRELL AND JITENDRA MALIK PRESENTER; MUHAMMAD OSAMA Object detection vs. classification

More information

HOG-based Pedestriant Detector Training

HOG-based Pedestriant Detector Training HOG-based Pedestriant Detector Training evs embedded Vision Systems Srl c/o Computer Science Park, Strada Le Grazie, 15 Verona- Italy http: // www. embeddedvisionsystems. it Abstract This paper describes

More information

Every Picture Tells a Story: Generating Sentences from Images

Every Picture Tells a Story: Generating Sentences from Images Every Picture Tells a Story: Generating Sentences from Images Ali Farhadi, Mohsen Hejrati, Mohammad Amin Sadeghi, Peter Young, Cyrus Rashtchian, Julia Hockenmaier, David Forsyth University of Illinois

More information

Nested Pictorial Structures

Nested Pictorial Structures Nested Pictorial Structures Steve Gu and Ying Zheng and Carlo Tomasi Department of Computer Science, Duke University, NC, USA 27705 {steve,yuanqi,tomasi}@cs.duke.edu Abstract. We propose a theoretical

More information

Translation Symmetry Detection: A Repetitive Pattern Analysis Approach

Translation Symmetry Detection: A Repetitive Pattern Analysis Approach 2013 IEEE Conference on Computer Vision and Pattern Recognition Workshops Translation Symmetry Detection: A Repetitive Pattern Analysis Approach Yunliang Cai and George Baciu GAMA Lab, Department of Computing

More information

Training Deformable Object Models for Human Detection based on Alignment and Clustering

Training Deformable Object Models for Human Detection based on Alignment and Clustering Training Deformable Object Models for Human Detection based on Alignment and Clustering Benjamin Drayer and Thomas Brox Department of Computer Science, Centre of Biological Signalling Studies (BIOSS),

More information

PEOPLE IN SEATS COUNTING VIA SEAT DETECTION FOR MEETING SURVEILLANCE

PEOPLE IN SEATS COUNTING VIA SEAT DETECTION FOR MEETING SURVEILLANCE PEOPLE IN SEATS COUNTING VIA SEAT DETECTION FOR MEETING SURVEILLANCE Hongyu Liang, Jinchen Wu, and Kaiqi Huang National Laboratory of Pattern Recognition, Institute of Automation, Chinese Academy of Science

More information

Object Category Detection. Slides mostly from Derek Hoiem

Object Category Detection. Slides mostly from Derek Hoiem Object Category Detection Slides mostly from Derek Hoiem Today s class: Object Category Detection Overview of object category detection Statistical template matching with sliding window Part-based Models

More information

Gradient of the lower bound

Gradient of the lower bound Weakly Supervised with Latent PhD advisor: Dr. Ambedkar Dukkipati Department of Computer Science and Automation gaurav.pandey@csa.iisc.ernet.in Objective Given a training set that comprises image and image-level

More information

Using k-poselets for detecting people and localizing their keypoints

Using k-poselets for detecting people and localizing their keypoints Using k-poselets for detecting people and localizing their keypoints Georgia Gkioxari, Bharath Hariharan, Ross Girshick and itendra Malik University of California, Berkeley - Berkeley, CA 94720 {gkioxari,bharath2,rbg,malik}@berkeley.edu

More information

Graphical Models for Computer Vision

Graphical Models for Computer Vision Graphical Models for Computer Vision Pedro F Felzenszwalb Brown University Joint work with Dan Huttenlocher, Joshua Schwartz, Ross Girshick, David McAllester, Deva Ramanan, Allie Shapiro, John Oberlin

More information