Detection and Handling of Occlusion in an Object Detection System

Size: px
Start display at page:

Download "Detection and Handling of Occlusion in an Object Detection System"

Transcription

1 Detection and Handling of Occlusion in an Object Detection System R.M.G. Op het Veld a, R.G.J. Wijnhoven b, Y. Bondarau c and Peter H.N. de With d a,b ViNotion B.V., Horsten 1, 5612 AX, Eindhoven, The Netherlands; a,c,d Eindhoven University of Technology, Den Dolech 2, 5612 AZ, Eindhoven, The Netherlands ABSTRACT Object detection is an important technique for video surveillance applications. Although different detection algorithms were proposed, they all have problems in detecting occluded objects. In this paper, we propose a novel system for occlusion handling and integrate this in a sliding-window detection framework using HOG features and linear classification. The occlusion handling is obtained by applying multiple classifiers, each covering a different level of occlusion and focusing on the non-occluded object parts. Experiments show that our approach based on 17 classifiers, obtains an increase of 8% in detection performance. To limit computational complexity, we propose a cascaded implementation that only increases the computational cost by 3.4%. Although the paper presents results for pedestrian detection, our approach is not limited to this object class. Finally, our system does not need an additional dataset for training, covering all possible types of occlusions. Keywords: Object Detection, Occlusion Handling, Histogram of Oriented Gradients (HOG) 1. INTRODUCTION In this paper we focus on object detection, in particular but not exclusively, the detection of humans in the domain of video surveillance. Detection of objects is a challenging task because of large variations in lighting (sun, shadows), object position and size, object deformations (shape) and large intra-class variations in object and background. Although the quality of detection algorithms is constantly improving and partially solve the previous challenges, state-of-the-art methods still struggle to detect objects that are occluded or are in unusual poses. 1 Occlusion is a particular problem that is different from the previous challenges, since it takes away partial object information. The variation and amount of occlusion forms a problem of its own, which has not been broadly studied, so that we specifically investigate the handling of occlusions in this paper. Some typical occlusions are visualized in Figure 1. Popular object detection algorithms use a sliding-window detection stage, where a sliding classification window is evaluated at different positions in the image. At each search position, the local image region is classified into object/background. To remove variations in contrast and light conditions, the raw intensity values of the image pixels are typically first transformed into an invariant feature space. A popular feature descriptor for object characterization is the Histograms of Oriented Gradients (HOG). 2 The obtained feature description is then classified by a linear Support Vector Machines (SVM), 3 which is selected for its simplicity and good performance. A well-known dataset for occlusion experiments is the Caltech Pedestrian dataset, which focuses on pedestrian detection in an urban environment. Here, over 70% of all pedestrians are occluded in at least a single video frame. Statistics on these occlusions show that 95% of all occlusions in this dataset occur from the bottom, the right and the left of the pedestrians. 4, 5 This aspect will be specifically addressed later in this paper. Our work concentrates on improving an existing real-time sliding-window object detection system that uses linear classification (SVM). To this end, we explore the detection of occluded regions and compare this with the detection of regions without occlusion. The first approach focuses on the detection of occlusions using the classification score, whereas the second approach focuses on the detection of non-occluded regions using multiple classifiers in parallel, each dealing with different partial occlusions. We will evaluate both approaches, and show that the latter approach is most suited. The remainder of the paper is organized as follows. We introduce related work in Section 2. Our existing object detection system is described in Section 3. Then, Section 4 describes our implementations of the two evaluated approaches. Section 5 outlines the applied datasets and experimental results. Experimental results are discussed in Section 6, followed by the conclusions in Section 7.

2 Figure 1. Typical occlusions in crowded scenes with pedestrians. 2. RELATED WORK AND DETAILED PROBLEM STATEMENT Approaches from literature to handle such occlusions are mainly divided into two groups. The first group focuses on the detection of occlusions, whereas the second group concentrates on the detection of non-occluded regions. Detection of occlusions is proposed in the following studies. Pixel-level segmentation methods 6 10 obtain a good detection performance, but result in high computational cost. Such methods distinguish different objects based on the segmentation outcome and use the pixel data of the segmented areas to classify the objects. To reduce computational cost, the segmentation resolution can be reduced from pixel to e.g. cell-level segmentation. 11 These techniques require pixel-level annotated data for training, which is preferably avoided. In a more detailed approach, Monroy and Ommer 7 propose a model-driven method to learn object shapes, without requiring segmented training data. Object models for detection are learned by explicit representations of object shapes and their segregation from the background. All learned shape-models have to be matched with every detection window, which is not feasible in real-time. Wang et al. 12 propose a technique that exploits classification scores per cell in a sliding-window detection system. They integrate the concept of object segmentation within the detection window and ignore the influence of occluded areas in the classification score for this window. Based on the localization of the occluded region, either an upper-body or a lower-body classifier will be evaluated on the non-occluded regions. This method is feasible for real-time requirements and only requires bounding-box annotated training data, instead of pixelsegmented training data. This paper will be addressed further in Section 4.1. Marín et al. 13 expand upon the work of Wang et al. 12 Instead of using a partial classifier for ambiguous detections, these detections are evaluated using an ensemble of random subspace classifiers. A validation set is used to select the best subset of classifiers, which forms a considerable drawback, as this dataset needs to cover all possible types of occlusions. The detection of non-occluded regions forms the second type of approaches. An evident first approach is to extend the image description with more informative features. As an example, HOG has already been extended with Locally Binary Patterns (LBP), 12 Color Self Similarity (CSS) 14 and Histograms Of Flow (HOF). 14, 15 Although this approach is useful, it emphasizes the object itself, but not the occlusions, so that it will have a limited or no impact on handling the occlusions. As an alternative, part-based detectors are partially robust against occlusions, because non-occluded parts are detected normally and occluded parts are missed. Tang et al. 21 propose multiple occlusion-aware classifiers are trained for pairs of specific combinations of occluding and occluded objects (e.g. person-to-person occlusion). Although this approach provides good results for pairs of pedestrians, different classifiers have to be trained for each other possible type of occlusion, which is not feasible for a real-world application. Mathias et al. 22 also use multiple classifiers that are evaluated for different sizes and positions of occlusions. Summarizing, we have adopted the work of Wang et al. 12 as a starting point, because it is based on the generic sliding-window detection approach and requires only limited additional complexity. To evaluate the suitability, we have also designed and implemented a system that focuses on the detection of non-occluded regions, using multiple classifiers, which is in line with Mathias et al. 22 This second method is chosen to have an alternative approach that is based on the same detection architecture, but uses a conceptually different solution.

3 Training stage (offline) Sliding window Positive images B A Negative images Compute features Train classifier Detection stage (online) C Compute features D Classification Threshold tclass E Occlusion handling Threshold tfinal F Detections G Merge detections Final detections Figure 2. Schematic overview of our object detection system. Training is performed offline in the top-left part (Block A and B) and normal (online) detection is depicted in the lower part (Block C G). 3. BASELINE OBJECT DETECTION SYSTEM The baseline object detection system is based on Histogram of Oriented Gradients (HOG) by Dalal and Triggs.2 The input image is first transformed into an invariant feature space that models object shape using orientation histograms. Object detection is performed by sliding a detection window over the image and classifying the feature description of this window into object/background. To detect objects of different sizes, the detection process is repeated for scaled versions of the input image. Finally, the window-level detections belonging to the same object are merged together into final detections. The total system is depicted in Figure 2. Prior to object detection, the system is trained using example images of object (positive) and background (negative). The images are transformed in the HOG feature space (Block A) and the resulting feature descriptors are used to train a classifier (Block B). After training (normal system operation), input images are first converted into the HOG feature space (Block C). The trained classifier is now used to detect objects in the sliding-window detection stage (Block D), which evaluates the image features. At each image search position, the classifier returns a confidence score, which is thresholded (tclass ) to obtain the detections (Block E). Because the object will be detected at several search positions and at several scales, all detections are merged by spatial clustering (Block F). Finally, these merged detections are also thresholded (tf inal ) to obtain the final detections (Block G). To describe the input image, we use the HOG feature transform.2 The image is divided into a spatial grid of cells of size 8 8 pixels. Gradient orientation information is calculated for each pixel and combined in a histogram for each cell. Orientation information is quantized into 9 bins and weighted by the gradient magnitude. Pixel gradients are calculated by filtering with [1, 0, 1] filters in both spatial dimensions. To normalize to image contrast, the histograms are normalized using L2 energy normalization. Each cell histogram is normalized multiple times, using the energy of all blocks of size 2 2 cells, of which this cell is part of. The feature vectors for all blocks belonging to the same detection window are concatenated and form the feature descriptor for this window. A visual representation of the complete process is shown in Figure 3. Intensity values Gradient calculation Orientations Combined orientations Histogram hi,j... x1, x2,..., xn Figure 3. Visual illustration of the HOG feature calculation process. To obtain object detection, the feature space is searched in a sliding fashion and the feature vector of each position is classified into object/background. During training, the linear classification decision boundary is created using a Support Vector Machine (SVM) classifier. After training, the resulting linear classifier is used for object detection, where the linear classification function is f (x) = ω T x + b, where b is the bias, ω the weight vector (normal of the hyperplane) and x the concatenation of all the features for the detection window (feature vector).

4 4. APPROACH The previously described sliding-window-based detection system is now extended with occlusion handling, where we focus on two different approaches. The first occlusion-handling approach detects occlusions using the cell scores of a full-window classifier. The second approach introduces a novel algorithm that focuses on designing a more occlusion-robust system by combining detections from multiple different classifiers. 4.1 Approach 1: occlusion detection The first approach is based on the work of Wang et al. 12 In case the score of the full-window classifier is ambiguous (t lower < score < t higher ), an occlusion might occur, the system uses the scores of the individual cells to segment the detection window into positive and negative regions. Negative regions typically do not contain an object and are ignored. Positive regions, possibly containing an object, are evaluated using a second partial classifier. Based on the score of the partial classifier, either the score of the full-window classifier (s glob ), the score of the partial classifier (s part ), or a combination of both is used. If the score of the partial classifiers is not sufficiently reliable (< t conf ), the global and partial classifiers are combined by a weighted combination of the scores, according to score = w part s part + w glob s glob. Negative cells decrease the score for the complete window, which can result in a misclassified sample. To separate negative cells from positive cells, first the score per cell needs to be computed. The score for a window is computed using the linear classification function from Section 3. The weight vector ω consists of weights for each individual cell. This vector is multiplied with the feature vector for the detection window, resulting in a weight per cell. In order to obtain the full-window classification score, a bias b is added, either to the full window or per cell. 12 The computed cell bias values are shown in Figure 4(b). Using these bias values per cell, the classification scores per cell can be calculated. Based on these scores, cells can be merged into different positive or negative regions. Negative regions indicate most likely occlusion or clutter and it is expected that ignoring these regions will improve the detection score. The merging of the noisy per-cell classification scores is implemented with a Mean-shift 23 segmentation algorithm. For this purpose, two kernels are employed: a Gaussian spatial kernel and a linear kernel, as specified by G(d) = e d2 2σ 2, G(s 1, s 2 ) = { wms s 2 if s 1 > 0, 0.5 w ms s 2 if s 1 0. (1) Here, d is the Euclidean distance between two cells, σ = 1 and s 1 and s 2 are the scores of the compared cells. 4.2 Approach 2: multiple classifiers for non-occluded regions In this approach, we pursue the novel concept of assigning a classifier for each different non-occluded region. Each classifier is trained for a certain occlusion pattern, as shown in Figure 6(a). During detection, all classifiers are evaluated at each sliding-window search position. We assume that in case of occlusion, there will be at least one classifier in the total set that matches to the current type of occlusion, so that a feasible detection is computed. It is important that the occlusion patterns and the related classifiers are representative for typical occlusion cases. We have designed 29 different classifiers, as shown in Figure 6(a). It seems straightforward to always choose the classifier that covers the smallest object region (largest occlusion region), but we will show in our experiments that such a classifier, despite being more robust to occlusion, always decreases the detection performance. For this reason, once individual classifiers are designed and selected, we will combine individual classifiers to create more robust classifiers for large occlusions. Since each classifier is trained independently, the margins for the linear classifiers are different. Classifiers covering a larger object region are incorporating more object information, so that they have better-defined margins and will perform better. For combining multiple classifiers, several metrics can be used. To calibrate each individual classifier, classifiers can be normalized using the maximum achievable detection score, as described in, 22 which is rather dataset dependent. Therefore, we propose an alternative to normalize each classifier by scaling the weight-vector ω (and its bias) of the linear classifier to unity energy. Furthermore, we calibrate t class for all classifiers using a fixed false-positive rate, to statistically equalize the number of false detections. Our advanced normalization is not only attractive for individual classifier normalization, but also paves the way for combining classifiers at a later

5 stage. Apart from individual normalization, the influence of each individual classifier to the combined result can be varied. This can be performed by imposing a weight of unity minus occlusion level 22 for each classifier. As in, 22 we assume that a small-object-region classifier performs always worse than a large-object-region classifier, so that the weight is adapted accordingly. For initial experiments, we have compared four possible combinations of selectively applying individual classifier normalization and applying weighting using the occlusion level, when combining the individual classifiers. These initial experiments have revealed that it is always preferred to apply both measures simultaneously. 4.3 Merging of detections After we have obtained multiple detections per object during the sliding-window detection stage, these detections have to be merged. Dalal 24 has proposed an elegant but computationally expensive Mean-shift procedure. A relatively simple method is proposed by Mathias et al., 22 which we refer to as NMS+Merging. We have implemented this two-stage approach because of the low computational requirements. First, all detections are sorted to their scores. Then, Non-Maximum Suppression (NMS) is applied to all detections belonging to the same classifier and the remaining detections are merged together. The criteria applied by Dollár 5 are used for non-maximum suppression (see Equation (2), left). If the overlap score is larger than the threshold t NMS, the detections are merged by ignoring the detection with the lowest score. In the second step, all remaining overlapping detections that satisfy the second overlap criterion are merged (see Equation (2), right). The previously stated NMS and overlap criteria are defined, respectively, as: area(b a B b ) min(area(b a ), area(b b )) > t NMS, area(b a B b ) area(b a B b ) > t merging. (2) If the overlap score is larger than the threshold t merging, the detections are merged and their detection scores are accumulated. We have empirically determined the threshold values at t NMS = 0.7 and t merging = EVALUATION We will evaluate the performance and efficiency of the two implemented approaches: the occlusion detection system from Wang et al. 12 and our system based on multiple classifiers,where each classifier detects different non-occluded regions. More specifically, we evaluate (1) if it is possible to use per-cell scores in order to detect occlusions, and (2) whether combining multiple, different classifiers can increase the detection performance. We first introduce the evaluation measures in Section 5.1 and then present the two datasets for our experiments in Section 5.2. For a better understanding of occlusion aspects related to the actual object class, we consider the importance of the region of occlusion in Section 5.3. The occlusion detection system is evaluated in Section 5.4 and our proposed multiple classifier system is extensively discussed and evaluated in Section Evaluation measures To quantify detection performance, we plot the Detection Error Trade-off (DET) 24 curves on a log-log scale, F alseneg i.e., the miss-rate (1 Recall or T ruep os+f alseneg ) versus the False Positives Per Window (FPPW). Evidently, low values for the miss-rate are desirable. The chosen parameters present the same information as the Receiver Operating Characteristics (ROC), but allow small probabilities to be distinguished more easily. We will use miss-rates at FPPW as a reference point for comparison of different results, since this is a realistic point of operation for a detection system. This implies that 1 out of 10, 000 negative windows is misclassified as an object. For the INRIA dataset, the images have on the average 50, 000 windows per image, when processing scales 1.0 and higher, while using scale factors To measure the performance of the complete system, we use the ROC curves, in which we plot the miss-rate versus the False Positives Per Image (FPPI) and compare results using the Area Under Curve (AUC) measure, where a lower AUC indicates a better detector performance. We have followed the same evaluation details as described by Dollár et al., 4, 5 in order to obtain comparable results.

6 5.2 Datasets We determine our results on the publicly available INRIA 2 dataset, but have also employed our own, more crowded Dancefestival dataset. The INRIA 2 dataset contains 288 images with 1, 132 person crops of size pixels. The negative test set contains 453 images. The training set contains 1, 208 cropped object images (doubled by horizontal mirroring), and 1, 218 negative images. All results are reported for the 288 positive test images, while detecting pedestrians of 96 pixels or more in height. The negative images are used for the DET curves. Additionally, our Dancefestival dataset consists of 864 annotated persons in 38 images of resolution 1, pixels, containing 323 occluded persons. This dataset is added to evaluate performance in a crowded real-world scenario with many occlusions. Evaluation results are reported on pedestrians of height 48 pixels and up. For all experiments, we have trained our classifiers on the INRIA Person training set. Since the bounding-box annotations have different aspect ratios, we normalize all boxes to have a width of 0.41 times the height as in. 5 We search the image 3 outside the image cell coordinates to find people at the image borders. 5.3 Importance of region of occlusion In a first synthetic experiment, we evaluate the effect of occlusions at different object positions on the detection performance. We occlude each object image with a black rectangle at eight different vertical positions (height: 2 cells/16 pixels, width: image width/64 pixels), as shown in Figure 4(a) (a) Synthetic occlusion patterns Position of occlusion/canceling Canceling Baseline Miss-rate (%) (b1) (b2) -1 0 (b) Classification performance for occlusions vs cell cancelling (missrates at FPPW, lower is better). The average image of all test images is depicted in (b1). Negative bias values per cell in (b2). Figure 4. Different synthetic image occlusions in (a) and the evaluated classification performance in (b). The images are evaluated using both our baseline system without occlusion handling, and the perfect occlusion-handling method that cancels the contribution of the occluding cells in the final classification result. During canceling, the corresponding feature dimensions are set to zero, while compensating for the bias of these cells (see Subsection 4.1) by subtracting the corresponding values (visualized in Figure 4(b)). Note that more negative cell bias values represent more important cells. The occluded area belonging to Position 1 is indicated by the striped pattern. From the results shown in Figure 4(b), we observe that the detection performance is position-dependent and deteriorates mostly for adding occlusions to Positions 1 (head) and 5 (knees), which indicates that these regions have the highest importantance. In general, the miss-rates are higher than the performance with the nonoccluded images (15%). When adding occlusion to the bottom image part (Position 7, the area below the feet), the performance without occlusion handling increases compared to the baseline, which is caused by the absence of gradient information in this region. We can conclude that in all cases, occlusion decreases the performance of the baseline system. Furthermore, even in the case of perfect occlusion detection, cell cancellation still results in a decreased performance.

7 5.4 Approach 1: occlusion detection We will now evaluate the effect of detecting occlusions at the HOG cell level, as proposed by Wang et al. 12 and introduced in Section 4.1. After an extensive study to this approach, we have found that the output of the cell-based occlusion detection (after Mean-shift segmentation) is only used to activate the two partial classifiers (upper/lower body), provided that sufficient positive object information is present. The occlusion handling in this case is only covered by the fact that a partial classifier is activated at the image part that is not occluded. Therefore, we try to answer the following three questions. (1) What is the contribution of the segmentation compared to always applying the partial classifiers? (2) More specifically, is the occlusion detection stage necessary, or is it sufficient to always evaluate the partial classifiers? (3) How is the decision made to activate the partial classifiers? To answer these questions, we evaluate the influence of the segmentation and the amount of non-occluded information (positive cells), which is required to activate the partial classifiers. Two partial classifiers of size 8 8 cells are used, one for the upper body and one for the lower body. Both are trained using 50% of the original annotations and with one round of bootstrapping. We use the following parameters, t lower = 1, t higher = 1, t conf = 1.5, w part = 0.3 and w glob = 0.7, which are similar to the settings of Wang. We evaluate the effect of the weighting function from Section 4.1, by comparing with w part = 1.0 and w glob = 0.0. We use the number of positive cells to enable the partial classifiers and experimentally sweep this number from 0 to 64, using steps of 4 cells. Note that a minimum number of 0 cells is equal to always applying the partial classifiers and a minimum of 64 cells is equal to never applying the partial classifiers. When both partial classifiers are activated, s part is equal to the maximum of both partial classifiers. None of the classifiers are normalized. The obtained results are depicted in Figure 5, using window-level classification results. Unfortunately, we have not been able to reproduce any results published in Wang et al. 12 and expect that they have performed additional processing steps not described in their paper. Figure 5. Minimum of positive cells in each region vs miss-rates (for FPPW). Lower values are better. The use of only one full-object classifier (minimum of 64 positive cells) is always outperformed by the addition of partial classifiers. Best results without Mean-shift are obtained by activating partial classifiers when at least 24 positive cells are found in each region. When enabling Mean-shift, the optimum is obtained when at least 4 positive cells are found. However, this miss-rate is equal to the lowest miss-rate obtained without Meanshift. Overall, adding Mean-shift performs worse, or the improvement in detection performance is negligible (0.3%). Weighting (w part = 0.3, w glob = 0.7) is always required to compensate for the high number of false detections from the partial classifiers. Disabling weighting (w part = 1.0, w glob = 0.0) always results in miss-rates well above 15% and are therefore not shown. We have also experimented with Mean-shift on the binary cell scores, resulting in comparable results. Although this performance measure shows an increase in window-level classification performance, our integration in the complete system results in a decrease in performance when measuring False Positive Per Image (FPPI) (including merging). Summarizing, using segmentation information from occlusion detection results in a negligible increase in performance, compared to always applying the partial classifiers. Moreover, the performance of the detection system is even lower when it is embedded within the complete detection system (including merging). Using the segmentation information improves performance best when only few positive cells are required to enable the partial classifiers, showing that the noisy cell-based classification output cannot be employed directly. From the above, we therefore conclude that occlusion detection is not necessary and the addition of partial classifiers is the main source for the performance improvement.

8 5.5 Approach 2: multiple classifiers for non-occluded regions In this section, we will evaluate the effect of detecting the non-occluded object regions. Already in the previous approach, we have found that the main performance improvement originates from the application of two partial object classifiers, instead of detecting the occluded region. We will now evaluate the concept of applying multiple classifiers that cover different non-occluded regions in more detail. First, we evaluate which classifiers should be combined to obtain optimal results. Second, we examine a method to combine multiple classifiers, while calibrating each individual classifier. Finally, we propose a cascaded implementation to lower the computational cost for a real-time implementation Classifier design and evaluation of effective object region We already know that occlusions typically occur at certain object regions (bottom, right and left). 5 In a previous experiment in Section 5.3, we have found that for persons, most informative visual information is concentrated around the head area. We use these statistics and our findings to manually design 29 different classifiers, shown in Figure 6(a). The region that models the hypothesized occlusion area is ignored by the classifier and is drawn in black. To demonstrate the importance of the effective object region covered by the classifier (the classifier size), we compare 9 differently-sized classifiers, where the size of the occluded region is constantly increased (Classifiers 0-7 and 28 from Figure 6(a)). Each classifier is independently trained on the INRIA set and the performance is evaluated by measuring the AUC. The 9 different classifiers and their detection performances are shown in shown in Figure 6(b). Overall, we conclude that classifiers covering a larger region perform better. Classifier 1 obtains the lowest overall miss-rate, which we expect to be caused by noise in the lower part of the region and is ignored in this classifier. This finding is supported by our findings from Section 5.3. We have obtained comparable results for right-to-left and left-to-right occlusions. (a) (b) Different classifiers with detection performances (AUC) (lower is better). Figure 6. Multiple classifiers with different occlusion patterns in (a) and the detection performance of different classifiers with increasingly smaller object area in (b). Black areas represent occlusions where classifier weights have value zero Selection of classifier combinations When applying multiple classifiers using different effective object regions, each classifier will focus on different visual properties. We assume that the combination of these different classifiers will result in an improved detection performance. However, it is difficult to predict which combination of classifiers performs best. Therefore, we evaluate all 29 classifiers from Figure 6(a) and measure their effective detection performance. First, all classifiers are applied independently to the images and the detections (after merging) from all classifiers are evaluated to measure the contribution of each individual classifier. The amount of occurrences of each classifier is then used to select a combination of classifiers. A classifier is counted when it has the largest contribution (highest detection score) to a correct detection (ground truth). Classifiers are not weighted by the occlusion level, so that large-region classifiers are not prioritized. Note that weighting is enabled when the selected classifiers are applied in the final detection system. The classifiers are evaluated on both the positive INRIA test set and the Dancefestival datasets. We compare normalized classifiers and set the thresholds for each classifier at a false positive rate of

9 Table 1. Different combinations of classifiers. Dataset Selection # Manual , 15 3, 9, , 28 INRIA Auto 1, 3, 14, 2, 10, 12, 27, 7, 8, 9, 13, 4, 26, 11, 5, 15, 19 Dancefestival Auto 9, 13, 27, 12, 14, 3, 15, 26, 1, 2, 10, 8, 4, 11, 5, 7, 6 The relative number of detections for each classifier is shown in Figure 7(a). Note that a low contribution of a classifier means that there is another classifier giving higher detection scores on the same detections, thereby making this non-contributing classifier inferior. From Figure 7(a), it can be seen that the classifiers are inferior for both datasets. The Dancefestival dataset contains more occluded persons, leading to a higher preference for small-region classifiers and left/right occlusions. We now combine a selection of classifiers based on this occurrence histogram and evaluate the performance of the classifier combination on the INRIA dataset. To compare the quality of the automated selection process, we also evaluate the performance of a manual selection of classifiers. The classifier numbers of the selections are listed in Table 1. The first row shows the manually selected classifiers, while rows two and three depict the selections generated from the INRIA and Dancefestival datasets, respectively. Note that the classifier numbers correspond to the numbers from Figure 6(b). Increasing the number of combined classifiers increases detection performance and converges at around 11 classifiers. In general, adding more classifiers decreases the performance. In all three considered cases, a clear optimum occurs and from this point onwards, the performance always decreases when adding more classifiers. This implies that there is an optimal set of classifiers. Additional classifiers only add already found detections (redundancy) and therefore only generate false detections. With automated selection, the best results are obtained with 11 classifiers selected from the Dancefestival dataset. However, an even better performance is obtained with the combinations of 7 and 17 manually selected classifiers. Note that the selection of the first classifier is most critical. This is clearly seen from the Dancefestical selection, where Classifier 9 is selected as the first classifier, modeling a significant amount of occlusion. Amount of total detections (%) INRIA Dancefestival Classifier number (a) Influence of individual classifiers on total detections. AUC 36% 34% 32% INRIA auto Dancefestival auto Manually selected 30% Number of classifiers (b) Detection performance for multiple classifiers. Figure 7. Combining multiple classifiers and measuring the influence of individual detectors in (a) and evaluating the detection performance of several classifier combinations in (b). (a) Detection using 1 classifier (b) Detection using 17 classifiers Figure 8. Example detections when applying 1 vs. 17 classifiers. Note that multiple classifiers enable the detection of significantly occluded persons.

10 5.5.3 Real-time implementation Although this advanced occlusion handling increases the detection performance, adding more classifiers increases the computational costs linearly with the amount of classifiers. To reduce these costs, we propose a cascaded implementation that limits the amount of comparisons at each sliding-window search position. At each position, the largest-region classifier is evaluated first and only when the classification score is above a threshold, all other classifiers are evaluated. This already discards many search positions after applying the first classifier. We have evaluated both the computational complexity and the detection performance for a system with 1, 2, 4, 7 and 17 classifiers when operated with different threshold values (t class ). The results for the manually selected classifiers are depicted in Figure 9. This figure visualizes both the computational costs and the detection performance, both relative to the baseline system with one classifier. The computational costs are shown by the bars and are linked to the left vertical axis. Here, the value 100% represents the computational costs when applying the single classifier baseline system. The detection performances of the different systems are shown by the lines and are linked to the vertical axis at the right. Here, the value 100% represents the performance of the single classifier baseline system. Combining up to 17 classifiers, the performance always increases. However, combining more than 7 classifiers does not improve the performance significantly. Using 7 classifiers, the computational costs increase by 1.3%, while increasing the associated detection performance by 7.6%. A suitable trade-off between performance and costs would be to adopt 17 classifiers, using initial threshold 0.2, leading to an improvement of 8% in detection performance for 3.4% higher cost. This combination of classifiers detects more occluded objects, as shown in an example picture in Figure 8. Figure 9. Performance and computational costs for combinations of different numbers of classifiers. Detection performance is indicated by the lines (refer to axis at the right) and computational costs are indicated by the bars (refer to axis at the left). All classifier combinations are manually selected, as in Table 1.

11 6. DISCUSSION We have evaluated two conceptual approaches for occlusion handling: detecting occluded object regions and detecting non-occluded regions. Although in our experiments we have shown that the detection of occlusions is not preferable, these results can be improved when using multiple image features. We have found that despite Wang et al. 12 describe the use of only HOG features for occlusion detection, instead they combine HOG with LBP. Marín et al. 13 show that extending HOG with LBP results in a significant improvement of the detection performance. Furthermore, our method for automated classifier selection is not optimal. A more elaborate selection procedure should select mutually exclusive classifiers to improve combined detection performance. We also want to put a critical note with the evaluation measures and parameters. Not all publications apply the same evaluation criteria and often, individual algorithms are based on different parameters (such as scales, dataset details and other algorithm parameters). In order to obtain comparable evaluation results, Dollár et al. 4, 5 made a framework for the objective evaluation of detection results. Unfortunately, comparing different implementations of the same algorithm is still influenced by the applied algorithmic parameters. The state-of-the-art detection performance on the INRIA dataset is obtained by Mathias et al., 22 which obtain a miss-rate of 16.62%. By adding occlusion handling, the miss-rate decreases to 13.70%, while the computational cost increases with 330%. In our system with 17 classifiers, the miss-rate is lowered from 33.88% to 31.05%, while only increasing computations by 3.4%. This performance difference is caused by the difference between our simple HOG features, vs. the more discriminative features from. 22 However, these features are specific to the object class and introduce additional computational complexity. Finally, we want to remark that we have recovered the work of Mathias et al. 22 from literature only when already completing our own work. Although this resulted in several similarities, it also shows that the concept of applying multiple classifiers to detect non-occluded object regions provides a suitable solution that is now supported by two relatively independent investigations. 7. CONCLUSION In this paper, we have proposed a novel system for occlusion handling and integrated this in a sliding-window detection framework, using simple HOG features and linear classification. Occlusion handling is obtained by the combination of multiple classifiers, each covering a different level of occlusion. For real-time detection, our approach with 17 classifiers obtains an increase of 8% in detection performance, with respect to the baseline system. We have proposed a cascaded implementation that only increases computational cost by 3.4%. Although we only present results for pedestrian detection, our approach is not limited specifically to this object class. Moreover, the fixed HOG feature transformation allows for an extension towards other object classes without additional class-specific feature calculation. Pre-defining the types of occlusions prior to training creates the advantage that we do not need an additional dataset for training, which covers all possible types of occlusions. We have revealed that the effect of occlusion on the detection performance is position-dependent and for pedestrian detection, performance deteriorates mostly for occlusions around the head and knees. After implementing and evaluating the method by Wang et al., 12 we conclude that the largest contribution of the proposed occlusion handling is not caused by the cell-based occlusion detection and region merging, but originates from the addition of partial classifiers (upper/lower body). We have found that simply applying small-region classifiers that cover only a part of the object (e.g. head-only detector) and therefore can handle more occlusions, strongly decrease the detection performance. Combining multiple classifiers increases the detection performance up to a certain optimal number of classifiers. Adopting more classifiers beyond this point only adds already found detections (redundancy) and generates false detections, thereby effectively decreasing the detection performance. We have proposed an automated selection method for classifiers using statistics based on the occurrences of occlusions. Although this method performs better in some cases, the automatic selection is strongly dependent on the dataset and is regularly outperformed by the manual selection of classifiers. Besides this, the selection of the first classifier has been found to be most critical for the final system operation. Automated selection can be further improved by an iterative classifier selection method that removes intermediate detections. This combination of multiple classifiers enables the detection of the persons that are strongly occluded. Personal communication

12 REFERENCES [1] Hoiem, D., Chodpathumwan, Y., and Dai, Q., Diagnosing error in object detectors, in [Proc. European Conference on Computer Vision (ECCV)], , Springer (2012). [2] Dalal, N. and Triggs, B., Histograms of oriented gradients for human detection, in [Proc. Conference on Computer Vision and Pattern Recognition (CVPR)], 1, vol. 1 (2005). [3] Cortes, C. and Vapnik, V., Support-vector networks, Machine learning 20(3), (1995). [4] Dollár, P., Wojek, C., Schiele, B., and Perona, P., Pedestrian detection: A benchmark, in [Proc. Conference on Computer Vision and Pattern Recognition (CVPR)], , IEEE (2009). [5] Dollár, P., Wojek, C., Schiele, B., and Perona, P., Pedestrian detection: An evaluation of the state of the art, Trans. Pattern Analysis and Machine Intelligence (PAMI) 34(4), (2012). [6] Winn, J. and Shotton, J., The layout consistent random field for recognizing and segmenting partially occluded objects, in [Proc. Conference Computer Vision Pattern Recognition], 1, 37 44, IEEE (2006). [7] Monroy, A. and Ommer, B., Beyond bounding-boxes: Learning object shape by model-driven grouping, in [Proc. European Conference on Computer Vision (ECCV)], , Springer (2012). [8] Yang, Y., Hallman, S., Ramanan, D., and Fowlkes, C., Layered object detection for multi-class segmentation, in [Proc. Conference Computer Vision Pattern Recognition (CVPR)], , IEEE (2010). [9] Gould, S., Fulton, R., and Koller, D., Decomposing a scene into geometric and semantically consistent regions, in [Proc. International Conference on Computer Vision (ICCV)], 1 8, IEEE (2009). [10] Gould, S., Gao, T., and Koller, D., Region-based segmentation and object detection, in [Advances in Neural Information Processing Systems], (2009). [11] Gao, T., Packer, B., and Koller, D., A segmentation-aware object detection model with occlusion handling, in [Proc. Conference on Computer Vision and Pattern Recognition (CVPR)], , IEEE (2011). [12] Wang, X., Han, T., and Yan, S., An hog-lbp human detector with partial occlusion handling, in [Proc. International Conference on Computer Vision (ICCV)], (2009). [13] Vazquez, D., Lopez, A., Amores, J., Kuncheva, L., et al., Occlusion handling via random subspace classifiers for human detection, Transactions on Systems, Man, and Cybernetics (Part B) 44(3), (2013). [14] Walk, S., Majer, N., Schindler, K., and Schiele, B., New features and insights for pedestrian detection, in [Proc. Conference on Computer Vision and Pattern Recognition (CVPR)], , IEEE (2010). [15] Dalal, N., Triggs, B., and Schmid, C., Human detection using oriented histograms of flow and appearance, in [Proc. European Conference on Computer Vision (ECCV)], , Springer (2006). [16] Felzenszwalb, P., Girshick, R., McAllester, D., and Ramanan, D., Object detection with discriminatively trained part-based models, Trans. Pattern Analysis and Machine Intelligence 32(9), (2010). [17] Leibe, B., Seemann, E., and Schiele, B., Pedestrian detection in crowded scenes, in [Proc. Conference on Computer Vision and Pattern Recognition (CVPR)], 1, , IEEE (2005). [18] Mikolajczyk, K., Schmid, C., and Zisserman, A., Human detection based on a probabilistic assembly of robust part detectors, in [Proc. European Conference on Computer Vision (ECCV)], 69 82, Springer (2004). [19] Wu, B. and Nevatia, R., Detection of multiple, partially occluded humans in a single image by bayesian combination of edgelet part detectors, in [Proc. International Conference on Computer Vision (ICCV)], 1, 90 97, IEEE (2005). [20] Fergus, R., Perona, P., and Zisserman, A., Object class recognition by unsupervised scale-invariant learning, in [Proc. Conference on Computer Vision and Pattern Recognition (CVPR)], 2, II 264, IEEE (2003). [21] Tang, S., Andriluka, M., and Schiele, B., Detection and tracking of occluded people, International Journal of Computer Vision, 1 12 (2012). [22] Mathias, M., Benenson, R., Timofte, R., and Van Gool, L., Handling occlusions with franken-classifiers, Proc. International Conference on Computer Vision (ICCV) (2013). [23] Cheng, Y., Mean shift, mode seeking, and clustering, Trans. Pattern Analysis and Machine Intelligence (PAMI) 17(8), (1995). [24] Dalal, N., Finding people in images and videos, PhD thesis, Institut National Polytechnique de Grenoble- INPG (2006).

Object Detection with Partial Occlusion Based on a Deformable Parts-Based Model

Object Detection with Partial Occlusion Based on a Deformable Parts-Based Model Object Detection with Partial Occlusion Based on a Deformable Parts-Based Model Johnson Hsieh (johnsonhsieh@gmail.com), Alexander Chia (alexchia@stanford.edu) Abstract -- Object occlusion presents a major

More information

FAST HUMAN DETECTION USING TEMPLATE MATCHING FOR GRADIENT IMAGES AND ASC DESCRIPTORS BASED ON SUBTRACTION STEREO

FAST HUMAN DETECTION USING TEMPLATE MATCHING FOR GRADIENT IMAGES AND ASC DESCRIPTORS BASED ON SUBTRACTION STEREO FAST HUMAN DETECTION USING TEMPLATE MATCHING FOR GRADIENT IMAGES AND ASC DESCRIPTORS BASED ON SUBTRACTION STEREO Makoto Arie, Masatoshi Shibata, Kenji Terabayashi, Alessandro Moro and Kazunori Umeda Course

More information

Multiple-Person Tracking by Detection

Multiple-Person Tracking by Detection http://excel.fit.vutbr.cz Multiple-Person Tracking by Detection Jakub Vojvoda* Abstract Detection and tracking of multiple person is challenging problem mainly due to complexity of scene and large intra-class

More information

Deformable Part Models

Deformable Part Models CS 1674: Intro to Computer Vision Deformable Part Models Prof. Adriana Kovashka University of Pittsburgh November 9, 2016 Today: Object category detection Window-based approaches: Last time: Viola-Jones

More information

A novel template matching method for human detection

A novel template matching method for human detection University of Wollongong Research Online Faculty of Informatics - Papers (Archive) Faculty of Engineering and Information Sciences 2009 A novel template matching method for human detection Duc Thanh Nguyen

More information

PEDESTRIAN DETECTION IN CROWDED SCENES VIA SCALE AND OCCLUSION ANALYSIS

PEDESTRIAN DETECTION IN CROWDED SCENES VIA SCALE AND OCCLUSION ANALYSIS PEDESTRIAN DETECTION IN CROWDED SCENES VIA SCALE AND OCCLUSION ANALYSIS Lu Wang Lisheng Xu Ming-Hsuan Yang Northeastern University, China University of California at Merced, USA ABSTRACT Despite significant

More information

HOG-based Pedestriant Detector Training

HOG-based Pedestriant Detector Training HOG-based Pedestriant Detector Training evs embedded Vision Systems Srl c/o Computer Science Park, Strada Le Grazie, 15 Verona- Italy http: // www. embeddedvisionsystems. it Abstract This paper describes

More information

Category-level localization

Category-level localization Category-level localization Cordelia Schmid Recognition Classification Object present/absent in an image Often presence of a significant amount of background clutter Localization / Detection Localize object

More information

Detection III: Analyzing and Debugging Detection Methods

Detection III: Analyzing and Debugging Detection Methods CS 1699: Intro to Computer Vision Detection III: Analyzing and Debugging Detection Methods Prof. Adriana Kovashka University of Pittsburgh November 17, 2015 Today Review: Deformable part models How can

More information

Mobile Human Detection Systems based on Sliding Windows Approach-A Review

Mobile Human Detection Systems based on Sliding Windows Approach-A Review Mobile Human Detection Systems based on Sliding Windows Approach-A Review Seminar: Mobile Human detection systems Njieutcheu Tassi cedrique Rovile Department of Computer Engineering University of Heidelberg

More information

Category vs. instance recognition

Category vs. instance recognition Category vs. instance recognition Category: Find all the people Find all the buildings Often within a single image Often sliding window Instance: Is this face James? Find this specific famous building

More information

Human detection using local shape and nonredundant

Human detection using local shape and nonredundant University of Wollongong Research Online Faculty of Informatics - Papers (Archive) Faculty of Engineering and Information Sciences 2010 Human detection using local shape and nonredundant binary patterns

More information

GPU-based pedestrian detection for autonomous driving

GPU-based pedestrian detection for autonomous driving Procedia Computer Science Volume 80, 2016, Pages 2377 2381 ICCS 2016. The International Conference on Computational Science GPU-based pedestrian detection for autonomous driving V. Campmany 1,2, S. Silva

More information

Pedestrian Detection and Tracking in Images and Videos

Pedestrian Detection and Tracking in Images and Videos Pedestrian Detection and Tracking in Images and Videos Azar Fazel Stanford University azarf@stanford.edu Viet Vo Stanford University vtvo@stanford.edu Abstract The increase in population density and accessibility

More information

Occlusion Handling via Random Subspace Classifiers for Human Detection

Occlusion Handling via Random Subspace Classifiers for Human Detection IEEE TRANSACTIONS ON SYSTEMS, MAN, AND CYBERNETICS PART B:CYBERNETICS, VOL. X, NO. X, MONTH YEAR 1 Occlusion Handling via Random Subspace Classifiers for Human Detection Javier Marín, David Vázquez, Antonio

More information

Object Category Detection. Slides mostly from Derek Hoiem

Object Category Detection. Slides mostly from Derek Hoiem Object Category Detection Slides mostly from Derek Hoiem Today s class: Object Category Detection Overview of object category detection Statistical template matching with sliding window Part-based Models

More information

Development in Object Detection. Junyuan Lin May 4th

Development in Object Detection. Junyuan Lin May 4th Development in Object Detection Junyuan Lin May 4th Line of Research [1] N. Dalal and B. Triggs. Histograms of oriented gradients for human detection, CVPR 2005. HOG Feature template [2] P. Felzenszwalb,

More information

Object detection using non-redundant local Binary Patterns

Object detection using non-redundant local Binary Patterns University of Wollongong Research Online Faculty of Informatics - Papers (Archive) Faculty of Engineering and Information Sciences 2010 Object detection using non-redundant local Binary Patterns Duc Thanh

More information

MULTI ORIENTATION PERFORMANCE OF FEATURE EXTRACTION FOR HUMAN HEAD RECOGNITION

MULTI ORIENTATION PERFORMANCE OF FEATURE EXTRACTION FOR HUMAN HEAD RECOGNITION MULTI ORIENTATION PERFORMANCE OF FEATURE EXTRACTION FOR HUMAN HEAD RECOGNITION Panca Mudjirahardjo, Rahmadwati, Nanang Sulistiyanto and R. Arief Setyawan Department of Electrical Engineering, Faculty of

More information

Human Detection and Tracking for Video Surveillance: A Cognitive Science Approach

Human Detection and Tracking for Video Surveillance: A Cognitive Science Approach Human Detection and Tracking for Video Surveillance: A Cognitive Science Approach Vandit Gajjar gajjar.vandit.381@ldce.ac.in Ayesha Gurnani gurnani.ayesha.52@ldce.ac.in Yash Khandhediya khandhediya.yash.364@ldce.ac.in

More information

Using the Deformable Part Model with Autoencoded Feature Descriptors for Object Detection

Using the Deformable Part Model with Autoencoded Feature Descriptors for Object Detection Using the Deformable Part Model with Autoencoded Feature Descriptors for Object Detection Hyunghoon Cho and David Wu December 10, 2010 1 Introduction Given its performance in recent years' PASCAL Visual

More information

Relational HOG Feature with Wild-Card for Object Detection

Relational HOG Feature with Wild-Card for Object Detection Relational HOG Feature with Wild-Card for Object Detection Yuji Yamauchi 1, Chika Matsushima 1, Takayoshi Yamashita 2, Hironobu Fujiyoshi 1 1 Chubu University, Japan, 2 OMRON Corporation, Japan {yuu, matsu}@vision.cs.chubu.ac.jp,

More information

Automatic Adaptation of a Generic Pedestrian Detector to a Specific Traffic Scene

Automatic Adaptation of a Generic Pedestrian Detector to a Specific Traffic Scene Automatic Adaptation of a Generic Pedestrian Detector to a Specific Traffic Scene Meng Wang and Xiaogang Wang {mwang,xgwang}@ee.cuhk.edu.hk Department of Electronic Engineering, The Chinese University

More information

Estimating Human Pose in Images. Navraj Singh December 11, 2009

Estimating Human Pose in Images. Navraj Singh December 11, 2009 Estimating Human Pose in Images Navraj Singh December 11, 2009 Introduction This project attempts to improve the performance of an existing method of estimating the pose of humans in still images. Tasks

More information

Object Detection by 3D Aspectlets and Occlusion Reasoning

Object Detection by 3D Aspectlets and Occlusion Reasoning Object Detection by 3D Aspectlets and Occlusion Reasoning Yu Xiang University of Michigan Silvio Savarese Stanford University In the 4th International IEEE Workshop on 3D Representation and Recognition

More information

PEOPLE IN SEATS COUNTING VIA SEAT DETECTION FOR MEETING SURVEILLANCE

PEOPLE IN SEATS COUNTING VIA SEAT DETECTION FOR MEETING SURVEILLANCE PEOPLE IN SEATS COUNTING VIA SEAT DETECTION FOR MEETING SURVEILLANCE Hongyu Liang, Jinchen Wu, and Kaiqi Huang National Laboratory of Pattern Recognition, Institute of Automation, Chinese Academy of Science

More information

COSC160: Detection and Classification. Jeremy Bolton, PhD Assistant Teaching Professor

COSC160: Detection and Classification. Jeremy Bolton, PhD Assistant Teaching Professor COSC160: Detection and Classification Jeremy Bolton, PhD Assistant Teaching Professor Outline I. Problem I. Strategies II. Features for training III. Using spatial information? IV. Reducing dimensionality

More information

342 IEEE TRANSACTIONS ON CYBERNETICS, VOL. 44, NO. 3, MARCH 2014

342 IEEE TRANSACTIONS ON CYBERNETICS, VOL. 44, NO. 3, MARCH 2014 342 IEEE TRANSACTIONS ON CYBERNETICS, VOL. 44, NO. 3, MARCH 2014 Occlusion Handling via Random Subspace Classifiers for Human Detection Javier Marín, David Vázquez, Antonio M. López, Jaume Amores, and

More information

Human Motion Detection and Tracking for Video Surveillance

Human Motion Detection and Tracking for Video Surveillance Human Motion Detection and Tracking for Video Surveillance Prithviraj Banerjee and Somnath Sengupta Department of Electronics and Electrical Communication Engineering Indian Institute of Technology, Kharagpur,

More information

Classification of objects from Video Data (Group 30)

Classification of objects from Video Data (Group 30) Classification of objects from Video Data (Group 30) Sheallika Singh 12665 Vibhuti Mahajan 12792 Aahitagni Mukherjee 12001 M Arvind 12385 1 Motivation Video surveillance has been employed for a long time

More information

DPM Score Regressor for Detecting Occluded Humans from Depth Images

DPM Score Regressor for Detecting Occluded Humans from Depth Images DPM Score Regressor for Detecting Occluded Humans from Depth Images Tsuyoshi Usami, Hiroshi Fukui, Yuji Yamauchi, Takayoshi Yamashita and Hironobu Fujiyoshi Email: usami915@vision.cs.chubu.ac.jp Email:

More information

Beyond Bags of Features

Beyond Bags of Features : for Recognizing Natural Scene Categories Matching and Modeling Seminar Instructed by Prof. Haim J. Wolfson School of Computer Science Tel Aviv University December 9 th, 2015

More information

Detecting Pedestrians by Learning Shapelet Features

Detecting Pedestrians by Learning Shapelet Features Detecting Pedestrians by Learning Shapelet Features Payam Sabzmeydani and Greg Mori School of Computing Science Simon Fraser University Burnaby, BC, Canada {psabzmey,mori}@cs.sfu.ca Abstract In this paper,

More information

Recap Image Classification with Bags of Local Features

Recap Image Classification with Bags of Local Features Recap Image Classification with Bags of Local Features Bag of Feature models were the state of the art for image classification for a decade BoF may still be the state of the art for instance retrieval

More information

Tri-modal Human Body Segmentation

Tri-modal Human Body Segmentation Tri-modal Human Body Segmentation Master of Science Thesis Cristina Palmero Cantariño Advisor: Sergio Escalera Guerrero February 6, 2014 Outline 1 Introduction 2 Tri-modal dataset 3 Proposed baseline 4

More information

Traffic Signs Recognition using HP and HOG Descriptors Combined to MLP and SVM Classifiers

Traffic Signs Recognition using HP and HOG Descriptors Combined to MLP and SVM Classifiers Traffic Signs Recognition using HP and HOG Descriptors Combined to MLP and SVM Classifiers A. Salhi, B. Minaoui, M. Fakir, H. Chakib, H. Grimech Faculty of science and Technology Sultan Moulay Slimane

More information

Deep Tracking: Biologically Inspired Tracking with Deep Convolutional Networks

Deep Tracking: Biologically Inspired Tracking with Deep Convolutional Networks Deep Tracking: Biologically Inspired Tracking with Deep Convolutional Networks Si Chen The George Washington University sichen@gwmail.gwu.edu Meera Hahn Emory University mhahn7@emory.edu Mentor: Afshin

More information

People detection in complex scene using a cascade of Boosted classifiers based on Haar-like-features

People detection in complex scene using a cascade of Boosted classifiers based on Haar-like-features People detection in complex scene using a cascade of Boosted classifiers based on Haar-like-features M. Siala 1, N. Khlifa 1, F. Bremond 2, K. Hamrouni 1 1. Research Unit in Signal Processing, Image Processing

More information

Object Category Detection: Sliding Windows

Object Category Detection: Sliding Windows 04/10/12 Object Category Detection: Sliding Windows Computer Vision CS 543 / ECE 549 University of Illinois Derek Hoiem Today s class: Object Category Detection Overview of object category detection Statistical

More information

Feature descriptors. Alain Pagani Prof. Didier Stricker. Computer Vision: Object and People Tracking

Feature descriptors. Alain Pagani Prof. Didier Stricker. Computer Vision: Object and People Tracking Feature descriptors Alain Pagani Prof. Didier Stricker Computer Vision: Object and People Tracking 1 Overview Previous lectures: Feature extraction Today: Gradiant/edge Points (Kanade-Tomasi + Harris)

More information

Three-Dimensional Object Detection and Layout Prediction using Clouds of Oriented Gradients

Three-Dimensional Object Detection and Layout Prediction using Clouds of Oriented Gradients ThreeDimensional Object Detection and Layout Prediction using Clouds of Oriented Gradients Authors: Zhile Ren, Erik B. Sudderth Presented by: Shannon Kao, Max Wang October 19, 2016 Introduction Given an

More information

Large-Scale Traffic Sign Recognition based on Local Features and Color Segmentation

Large-Scale Traffic Sign Recognition based on Local Features and Color Segmentation Large-Scale Traffic Sign Recognition based on Local Features and Color Segmentation M. Blauth, E. Kraft, F. Hirschenberger, M. Böhm Fraunhofer Institute for Industrial Mathematics, Fraunhofer-Platz 1,

More information

https://en.wikipedia.org/wiki/the_dress Recap: Viola-Jones sliding window detector Fast detection through two mechanisms Quickly eliminate unlikely windows Use features that are fast to compute Viola

More information

Efficient Detector Adaptation for Object Detection in a Video

Efficient Detector Adaptation for Object Detection in a Video 2013 IEEE Conference on Computer Vision and Pattern Recognition Efficient Detector Adaptation for Object Detection in a Video Pramod Sharma and Ram Nevatia Institute for Robotics and Intelligent Systems,

More information

Real-Time Human Detection using Relational Depth Similarity Features

Real-Time Human Detection using Relational Depth Similarity Features Real-Time Human Detection using Relational Depth Similarity Features Sho Ikemura, Hironobu Fujiyoshi Dept. of Computer Science, Chubu University. Matsumoto 1200, Kasugai, Aichi, 487-8501 Japan. si@vision.cs.chubu.ac.jp,

More information

Detecting Object Instances Without Discriminative Features

Detecting Object Instances Without Discriminative Features Detecting Object Instances Without Discriminative Features Edward Hsiao June 19, 2013 Thesis Committee: Martial Hebert, Chair Alexei Efros Takeo Kanade Andrew Zisserman, University of Oxford 1 Object Instance

More information

A Cascade of Feed-Forward Classifiers for Fast Pedestrian Detection

A Cascade of Feed-Forward Classifiers for Fast Pedestrian Detection A Cascade of eed-orward Classifiers for ast Pedestrian Detection Yu-ing Chen,2 and Chu-Song Chen,3 Institute of Information Science, Academia Sinica, aipei, aiwan 2 Dept. of Computer Science and Information

More information

A New Strategy of Pedestrian Detection Based on Pseudo- Wavelet Transform and SVM

A New Strategy of Pedestrian Detection Based on Pseudo- Wavelet Transform and SVM A New Strategy of Pedestrian Detection Based on Pseudo- Wavelet Transform and SVM M.Ranjbarikoohi, M.Menhaj and M.Sarikhani Abstract: Pedestrian detection has great importance in automotive vision systems

More information

Word Channel Based Multiscale Pedestrian Detection Without Image Resizing and Using Only One Classifier

Word Channel Based Multiscale Pedestrian Detection Without Image Resizing and Using Only One Classifier Word Channel Based Multiscale Pedestrian Detection Without Image Resizing and Using Only One Classifier Arthur Daniel Costea and Sergiu Nedevschi Image Processing and Pattern Recognition Group (http://cv.utcluj.ro)

More information

Fast and Stable Human Detection Using Multiple Classifiers Based on Subtraction Stereo with HOG Features

Fast and Stable Human Detection Using Multiple Classifiers Based on Subtraction Stereo with HOG Features 2011 IEEE International Conference on Robotics and Automation Shanghai International Conference Center May 9-13, 2011, Shanghai, China Fast and Stable Human Detection Using Multiple Classifiers Based on

More information

Detecting and Segmenting Humans in Crowded Scenes

Detecting and Segmenting Humans in Crowded Scenes Detecting and Segmenting Humans in Crowded Scenes Mikel D. Rodriguez University of Central Florida 4000 Central Florida Blvd Orlando, Florida, 32816 mikel@cs.ucf.edu Mubarak Shah University of Central

More information

Multi-Person Tracking-by-Detection based on Calibrated Multi-Camera Systems

Multi-Person Tracking-by-Detection based on Calibrated Multi-Camera Systems Multi-Person Tracking-by-Detection based on Calibrated Multi-Camera Systems Xiaoyan Jiang, Erik Rodner, and Joachim Denzler Computer Vision Group Jena Friedrich Schiller University of Jena {xiaoyan.jiang,erik.rodner,joachim.denzler}@uni-jena.de

More information

Combining PGMs and Discriminative Models for Upper Body Pose Detection

Combining PGMs and Discriminative Models for Upper Body Pose Detection Combining PGMs and Discriminative Models for Upper Body Pose Detection Gedas Bertasius May 30, 2014 1 Introduction In this project, I utilized probabilistic graphical models together with discriminative

More information

Selection of Scale-Invariant Parts for Object Class Recognition

Selection of Scale-Invariant Parts for Object Class Recognition Selection of Scale-Invariant Parts for Object Class Recognition Gy. Dorkó and C. Schmid INRIA Rhône-Alpes, GRAVIR-CNRS 655, av. de l Europe, 3833 Montbonnot, France fdorko,schmidg@inrialpes.fr Abstract

More information

A Novel Extreme Point Selection Algorithm in SIFT

A Novel Extreme Point Selection Algorithm in SIFT A Novel Extreme Point Selection Algorithm in SIFT Ding Zuchun School of Electronic and Communication, South China University of Technolog Guangzhou, China zucding@gmail.com Abstract. This paper proposes

More information

Finding Tiny Faces Supplementary Materials

Finding Tiny Faces Supplementary Materials Finding Tiny Faces Supplementary Materials Peiyun Hu, Deva Ramanan Robotics Institute Carnegie Mellon University {peiyunh,deva}@cs.cmu.edu 1. Error analysis Quantitative analysis We plot the distribution

More information

Fast Human Detection Algorithm Based on Subtraction Stereo for Generic Environment

Fast Human Detection Algorithm Based on Subtraction Stereo for Generic Environment Fast Human Detection Algorithm Based on Subtraction Stereo for Generic Environment Alessandro Moro, Makoto Arie, Kenji Terabayashi and Kazunori Umeda University of Trieste, Italy / CREST, JST Chuo University,

More information

Visual Detection and Species Classification of Orchid Flowers

Visual Detection and Species Classification of Orchid Flowers 14-22 MVA2015 IAPR International Conference on Machine Vision Applications, May 18-22, 2015, Tokyo, JAPAN Visual Detection and Species Classification of Orchid Flowers Steven Puttemans & Toon Goedemé KU

More information

Histograms of Oriented Gradients

Histograms of Oriented Gradients Histograms of Oriented Gradients Carlo Tomasi September 18, 2017 A useful question to ask of an image is whether it contains one or more instances of a certain object: a person, a face, a car, and so forth.

More information

Articulated Pose Estimation with Flexible Mixtures-of-Parts

Articulated Pose Estimation with Flexible Mixtures-of-Parts Articulated Pose Estimation with Flexible Mixtures-of-Parts PRESENTATION: JESSE DAVIS CS 3710 VISUAL RECOGNITION Outline Modeling Special Cases Inferences Learning Experiments Problem and Relevance Problem:

More information

The Pennsylvania State University. The Graduate School. College of Engineering ONLINE LIVESTREAM CAMERA CALIBRATION FROM CROWD SCENE VIDEOS

The Pennsylvania State University. The Graduate School. College of Engineering ONLINE LIVESTREAM CAMERA CALIBRATION FROM CROWD SCENE VIDEOS The Pennsylvania State University The Graduate School College of Engineering ONLINE LIVESTREAM CAMERA CALIBRATION FROM CROWD SCENE VIDEOS A Thesis in Computer Science and Engineering by Anindita Bandyopadhyay

More information

the relatedness of local regions. However, the process of quantizing a features into binary form creates a problem in that a great deal of the informa

the relatedness of local regions. However, the process of quantizing a features into binary form creates a problem in that a great deal of the informa Binary code-based Human Detection Yuji Yamauchi 1,a) Hironobu Fujiyoshi 1,b) Abstract: HOG features are effective for object detection, but their focus on local regions makes them highdimensional features.

More information

Recognition of Animal Skin Texture Attributes in the Wild. Amey Dharwadker (aap2174) Kai Zhang (kz2213)

Recognition of Animal Skin Texture Attributes in the Wild. Amey Dharwadker (aap2174) Kai Zhang (kz2213) Recognition of Animal Skin Texture Attributes in the Wild Amey Dharwadker (aap2174) Kai Zhang (kz2213) Motivation Patterns and textures are have an important role in object description and understanding

More information

Histogram of Oriented Gradients (HOG) for Object Detection

Histogram of Oriented Gradients (HOG) for Object Detection Histogram of Oriented Gradients (HOG) for Object Detection Navneet DALAL Joint work with Bill TRIGGS and Cordelia SCHMID Goal & Challenges Goal: Detect and localise people in images and videos n Wide variety

More information

Graph Matching Iris Image Blocks with Local Binary Pattern

Graph Matching Iris Image Blocks with Local Binary Pattern Graph Matching Iris Image Blocs with Local Binary Pattern Zhenan Sun, Tieniu Tan, and Xianchao Qiu Center for Biometrics and Security Research, National Laboratory of Pattern Recognition, Institute of

More information

2D Image Processing Feature Descriptors

2D Image Processing Feature Descriptors 2D Image Processing Feature Descriptors Prof. Didier Stricker Kaiserlautern University http://ags.cs.uni-kl.de/ DFKI Deutsches Forschungszentrum für Künstliche Intelligenz http://av.dfki.de 1 Overview

More information

Object Recognition. Computer Vision. Slides from Lana Lazebnik, Fei-Fei Li, Rob Fergus, Antonio Torralba, and Jean Ponce

Object Recognition. Computer Vision. Slides from Lana Lazebnik, Fei-Fei Li, Rob Fergus, Antonio Torralba, and Jean Ponce Object Recognition Computer Vision Slides from Lana Lazebnik, Fei-Fei Li, Rob Fergus, Antonio Torralba, and Jean Ponce How many visual object categories are there? Biederman 1987 ANIMALS PLANTS OBJECTS

More information

CS229: Action Recognition in Tennis

CS229: Action Recognition in Tennis CS229: Action Recognition in Tennis Aman Sikka Stanford University Stanford, CA 94305 Rajbir Kataria Stanford University Stanford, CA 94305 asikka@stanford.edu rkataria@stanford.edu 1. Motivation As active

More information

String distance for automatic image classification

String distance for automatic image classification String distance for automatic image classification Nguyen Hong Thinh*, Le Vu Ha*, Barat Cecile** and Ducottet Christophe** *University of Engineering and Technology, Vietnam National University of HaNoi,

More information

SURF. Lecture6: SURF and HOG. Integral Image. Feature Evaluation with Integral Image

SURF. Lecture6: SURF and HOG. Integral Image. Feature Evaluation with Integral Image SURF CSED441:Introduction to Computer Vision (2015S) Lecture6: SURF and HOG Bohyung Han CSE, POSTECH bhhan@postech.ac.kr Speed Up Robust Features (SURF) Simplified version of SIFT Faster computation but

More information

Histogram of Oriented Gradients for Human Detection

Histogram of Oriented Gradients for Human Detection Histogram of Oriented Gradients for Human Detection Article by Navneet Dalal and Bill Triggs All images in presentation is taken from article Presentation by Inge Edward Halsaunet Introduction What: Detect

More information

Recent Researches in Automatic Control, Systems Science and Communications

Recent Researches in Automatic Control, Systems Science and Communications Real time human detection in video streams FATMA SAYADI*, YAHIA SAID, MOHAMED ATRI AND RACHED TOURKI Electronics and Microelectronics Laboratory Faculty of Sciences Monastir, 5000 Tunisia Address (12pt

More information

SUMMARY: DISTINCTIVE IMAGE FEATURES FROM SCALE- INVARIANT KEYPOINTS

SUMMARY: DISTINCTIVE IMAGE FEATURES FROM SCALE- INVARIANT KEYPOINTS SUMMARY: DISTINCTIVE IMAGE FEATURES FROM SCALE- INVARIANT KEYPOINTS Cognitive Robotics Original: David G. Lowe, 004 Summary: Coen van Leeuwen, s1460919 Abstract: This article presents a method to extract

More information

Efficient Acquisition of Human Existence Priors from Motion Trajectories

Efficient Acquisition of Human Existence Priors from Motion Trajectories Efficient Acquisition of Human Existence Priors from Motion Trajectories Hitoshi Habe Hidehito Nakagawa Masatsugu Kidode Graduate School of Information Science, Nara Institute of Science and Technology

More information

Face Detection and Alignment. Prof. Xin Yang HUST

Face Detection and Alignment. Prof. Xin Yang HUST Face Detection and Alignment Prof. Xin Yang HUST Many slides adapted from P. Viola Face detection Face detection Basic idea: slide a window across image and evaluate a face model at every location Challenges

More information

Robust PDF Table Locator

Robust PDF Table Locator Robust PDF Table Locator December 17, 2016 1 Introduction Data scientists rely on an abundance of tabular data stored in easy-to-machine-read formats like.csv files. Unfortunately, most government records

More information

Pairwise Threshold for Gaussian Mixture Classification and its Application on Human Tracking Enhancement

Pairwise Threshold for Gaussian Mixture Classification and its Application on Human Tracking Enhancement Pairwise Threshold for Gaussian Mixture Classification and its Application on Human Tracking Enhancement Daegeon Kim Sung Chun Lee Institute for Robotics and Intelligent Systems University of Southern

More information

Pedestrian and Part Position Detection using a Regression-based Multiple Task Deep Convolutional Neural Network

Pedestrian and Part Position Detection using a Regression-based Multiple Task Deep Convolutional Neural Network Pedestrian and Part Position Detection using a Regression-based Multiple Tas Deep Convolutional Neural Networ Taayoshi Yamashita Computer Science Department yamashita@cs.chubu.ac.jp Hiroshi Fuui Computer

More information

Person Detection in Images using HoG + Gentleboost. Rahul Rajan June 1st July 15th CMU Q Robotics Lab

Person Detection in Images using HoG + Gentleboost. Rahul Rajan June 1st July 15th CMU Q Robotics Lab Person Detection in Images using HoG + Gentleboost Rahul Rajan June 1st July 15th CMU Q Robotics Lab 1 Introduction One of the goals of computer vision Object class detection car, animal, humans Human

More information

Modern Object Detection. Most slides from Ali Farhadi

Modern Object Detection. Most slides from Ali Farhadi Modern Object Detection Most slides from Ali Farhadi Comparison of Classifiers assuming x in {0 1} Learning Objective Training Inference Naïve Bayes maximize j i logp + logp ( x y ; θ ) ( y ; θ ) i ij

More information

A Boosted Multi-Task Model for Pedestrian Detection with Occlusion Handling

A Boosted Multi-Task Model for Pedestrian Detection with Occlusion Handling Proceedings of the Twenty-Ninth AAAI Conference on Artificial Intelligence A Boosted Multi-Task Model for Pedestrian Detection with Occlusion Handling Chao Zhu and Yuxin Peng Institute of Computer Science

More information

Ensemble of Bayesian Filters for Loop Closure Detection

Ensemble of Bayesian Filters for Loop Closure Detection Ensemble of Bayesian Filters for Loop Closure Detection Mohammad Omar Salameh, Azizi Abdullah, Shahnorbanun Sahran Pattern Recognition Research Group Center for Artificial Intelligence Faculty of Information

More information

Find that! Visual Object Detection Primer

Find that! Visual Object Detection Primer Find that! Visual Object Detection Primer SkTech/MIT Innovation Workshop August 16, 2012 Dr. Tomasz Malisiewicz tomasz@csail.mit.edu Find that! Your Goals...imagine one such system that drives information

More information

Human detection solution for a retail store environment

Human detection solution for a retail store environment FACULDADE DE ENGENHARIA DA UNIVERSIDADE DO PORTO Human detection solution for a retail store environment Vítor Araújo PREPARATION OF THE MSC DISSERTATION Mestrado Integrado em Engenharia Eletrotécnica

More information

Region-based Segmentation and Object Detection

Region-based Segmentation and Object Detection Region-based Segmentation and Object Detection Stephen Gould Tianshi Gao Daphne Koller Presented at NIPS 2009 Discussion and Slides by Eric Wang April 23, 2010 Outline Introduction Model Overview Model

More information

Sketchable Histograms of Oriented Gradients for Object Detection

Sketchable Histograms of Oriented Gradients for Object Detection Sketchable Histograms of Oriented Gradients for Object Detection No Author Given No Institute Given Abstract. In this paper we investigate a new representation approach for visual object recognition. The

More information

OCCLUSION BOUNDARIES ESTIMATION FROM A HIGH-RESOLUTION SAR IMAGE

OCCLUSION BOUNDARIES ESTIMATION FROM A HIGH-RESOLUTION SAR IMAGE OCCLUSION BOUNDARIES ESTIMATION FROM A HIGH-RESOLUTION SAR IMAGE Wenju He, Marc Jäger, and Olaf Hellwich Berlin University of Technology FR3-1, Franklinstr. 28, 10587 Berlin, Germany {wenjuhe, jaeger,

More information

Fast Human Detection Using a Cascade of Histograms of Oriented Gradients

Fast Human Detection Using a Cascade of Histograms of Oriented Gradients MITSUBISHI ELECTRIC RESEARCH LABORATORIES http://www.merl.com Fast Human Detection Using a Cascade of Histograms of Oriented Gradients Qiang Zhu, Shai Avidan, Mei-Chen Yeh, Kwang-Ting Cheng TR26-68 June

More information

Bootstrapping Boosted Random Ferns for Discriminative and Efficient Object Classification

Bootstrapping Boosted Random Ferns for Discriminative and Efficient Object Classification Pattern Recognition Pattern Recognition (22) 5 Bootstrapping Boosted Random Ferns for Discriminative and Efficient Object Classification M. Villamizar, J. Andrade-Cetto, A. Sanfeliu and F. Moreno-Noguer

More information

Part-based and local feature models for generic object recognition

Part-based and local feature models for generic object recognition Part-based and local feature models for generic object recognition May 28 th, 2015 Yong Jae Lee UC Davis Announcements PS2 grades up on SmartSite PS2 stats: Mean: 80.15 Standard Dev: 22.77 Vote on piazza

More information

Pedestrian Detection with Occlusion Handling

Pedestrian Detection with Occlusion Handling Pedestrian Detection with Occlusion Handling Yawar Rehman 1, Irfan Riaz 2, Fan Xue 3, Jingchun Piao 4, Jameel Ahmed Khan 5 and Hyunchul Shin 6 Department of Electronics and Communication Engineering, Hanyang

More information

Face detection and recognition. Detection Recognition Sally

Face detection and recognition. Detection Recognition Sally Face detection and recognition Detection Recognition Sally Face detection & recognition Viola & Jones detector Available in open CV Face recognition Eigenfaces for face recognition Metric learning identification

More information

CS 223B Computer Vision Problem Set 3

CS 223B Computer Vision Problem Set 3 CS 223B Computer Vision Problem Set 3 Due: Feb. 22 nd, 2011 1 Probabilistic Recursion for Tracking In this problem you will derive a method for tracking a point of interest through a sequence of images.

More information

Pedestrian Detection Using Structured SVM

Pedestrian Detection Using Structured SVM Pedestrian Detection Using Structured SVM Wonhui Kim Stanford University Department of Electrical Engineering wonhui@stanford.edu Seungmin Lee Stanford University Department of Electrical Engineering smlee729@stanford.edu.

More information

Object Detection Design challenges

Object Detection Design challenges Object Detection Design challenges How to efficiently search for likely objects Even simple models require searching hundreds of thousands of positions and scales Feature design and scoring How should

More information

[2008] IEEE. Reprinted, with permission, from [Yan Chen, Qiang Wu, Xiangjian He, Wenjing Jia,Tom Hintz, A Modified Mahalanobis Distance for Human

[2008] IEEE. Reprinted, with permission, from [Yan Chen, Qiang Wu, Xiangjian He, Wenjing Jia,Tom Hintz, A Modified Mahalanobis Distance for Human [8] IEEE. Reprinted, with permission, from [Yan Chen, Qiang Wu, Xiangian He, Wening Jia,Tom Hintz, A Modified Mahalanobis Distance for Human Detection in Out-door Environments, U-Media 8: 8 The First IEEE

More information

Histograms of Oriented Gradients for Human Detection p. 1/1

Histograms of Oriented Gradients for Human Detection p. 1/1 Histograms of Oriented Gradients for Human Detection p. 1/1 Histograms of Oriented Gradients for Human Detection Navneet Dalal and Bill Triggs INRIA Rhône-Alpes Grenoble, France Funding: acemedia, LAVA,

More information

Close-Range Human Detection for Head-Mounted Cameras

Close-Range Human Detection for Head-Mounted Cameras D. MITZEL, B. LEIBE: CLOSE-RANGE HUMAN DETECTION FOR HEAD CAMERAS Close-Range Human Detection for Head-Mounted Cameras Dennis Mitzel mitzel@vision.rwth-aachen.de Bastian Leibe leibe@vision.rwth-aachen.de

More information

Previously. Part-based and local feature models for generic object recognition. Bag-of-words model 4/20/2011

Previously. Part-based and local feature models for generic object recognition. Bag-of-words model 4/20/2011 Previously Part-based and local feature models for generic object recognition Wed, April 20 UT-Austin Discriminative classifiers Boosting Nearest neighbors Support vector machines Useful for object recognition

More information