Semantic Visual Decomposition Modelling for Improving Object Detection in Complex Scene Images

Size: px
Start display at page:

Download "Semantic Visual Decomposition Modelling for Improving Object Detection in Complex Scene Images"

Transcription

1 Semantic Visual Decomposition Modelling for Improving Object Detection in Complex Scene Images Ge Qin Department of Computing University of Surrey United Kingdom Bogdan Vrusias Department of Computing University of Surrey United Kingdom Abstract We propose a systematic method for constructing a compositional model for recognising object instances in images of real life subjects. The model is trained on a set of visual examples of contained in a given image, in order to capture the visual characteristics of the contained objects, and to derive spatial relationships between the internal key sub-components of each object instance. The recognition method focuses on extracting visual similarities at the component level in three feature spaces: histogram of boundary distribution, intensity histogram, and histogram of oriented gradient (HOG). Principle Component Analysis (PCA) is used for the component selection and feature weighting. The proposed recognition method is not only capable of improving the accuracy of popular object detection algorithms, but also offers a systematic way of generating detection models. Keywords Contextual object recognition, semantic object modelling, visual object decomposition. I. INTRODUCTION Visual recognition has been one of the most popular research areas in computer vision for the last half of the century. The research community nowadays focuses on semantically understanding the objects and its surrounding environment, beyond the visual appearance. Human beings are naturally capable of identifying both visual and semantic similarities for a given set of images. On one hand, we are able to extract similarities in shape, colour, texture or patterns in other photometric domains; on the other hand, we are also capable of interpreting the contextual information beyond the visual appearances and associate objects or scenes based on their semantic similarities. Such combination of visual and semantic analysis gives us the flexibility to select which information, visual or semantic, to use in order to achieve a particular recognition. It is unquestionable that better visual processing techniques provide better object recognition result and further simplify the semantics derivation. Improving the performances of classic image processing techniques [1, 2] have direct impacts on the performance of object recognition. Compared with the classic content-based information retrieval (CBIR) systems [3, 4], the research interests have been gradually moved from specific context-based object retrieval towards generic knowledge-based scene understanding [5, 6], focusing on visual analysis over images using queries relating to the visual features and compositions of visual features. Compositional based recognition is considered as a commonly accepted way of exploiting prior knowledge around the detecting model in the form of parts and the relationship between them [7, 8]. Borenstein [9] proposed a recognition system to extract a cow or a runner from its natural background by combining visual similarity driven bottom up segmentation stitching with knowledge driven top down splitting. Although the recognition only works on simple data, i.e. single object that is visually distinctive from the background, it provides a way to systematically recognise small visually descriptive pieces; group the pieces to form semantically descriptive objects guided by a model template. It can be considered as the very first initial step to derive highlevel object knowledge by analysing low-level visual descriptions. Oliva and Torralba s research reveals that statistical structure within the processing images plays a key fundamental role in generic scene understanding [10, 11]. Boutell s [12, 13] work in natural scene recognition analyses the trend of the spatial colour moment and uses it as a semantic feature to recognise outdoor scenes. He also developed a generative model to monitor pair-wise spatial relationship between semantic objects appeared in one scene instance. Currently, most scene understanding is performed on long distance natural landscape scenes. Such scene domain is advantageous as the semantic features are monolithic and normally applicable to the whole image. Furthermore, segmentation on long distance scene images usually outputs fewer regions, which simplifies the spatial relationship analysis between those regions. However, this scene understanding approach is difficult to be applied to recognition of structured object in indoor or closed scenes, which contain more detailed semantics relationships within an object or between objects.

2 The present work attempts to improve the recognition performance based on existing image processing techniques, by the addition of systematically extracted semantic information of the objects detected in the image. An object model is trained in a supervised fashion [14, 15] and the visually distinctive features, within each key component forming the detecting object, are extracted and weighed accordingly. As shown in Figure 1, the recognition process is split into two stages: Hypothesis Generation and Hypothesis Validation. Hypothesis Generation produces image patches that have overall similarities when comparing against the object model; and Hypothesis Validation examines the visual appearances and spatial relationship of the components inside the generated hypothesis to determine if sufficient details are extracted to announce the object recognition. Unlike other similar research that focuses mainly on natural landscape scenes, the presented work focuses mainly on street scenes with structural objects, where semantic relationships are embedded within the image details and are more consistent than general landscape themes. II. MODEL CONSTRUCTION Fig. 1. Object recognition proposal. Many cutting edge compositional based recognition methods focus on building a codebook containing a large amount of discriminative local features to describe the detecting object. In this work, instead of attempting to directly recognise a complicated structural object from processing arbitrary local features, we propose an intermediate stage to fuse the low-level visual information into components with basic semantic information. The overall recognition of an object depends on the successful recognition of several key sub-components of the object, which makes the recognition less dependent on the overall set of visual appearances of the detecting object. The object model,, consists the information at two feature levels: global level features and local level features. For global feature, captures the boundary distribution of the entire object; for local features, records visual patterns of the components within an object, for every component in the collection of components. For a particular component within the object, we represent the component using three visual feature descriptors, including boundary distribution ( ), histogram of oriented gradient ( ), and intensity histogram ( ). The model is constructed in a supervised approach, where the detecting object and its inner-components are labelled in the training samples. The feature map for each component is built in an un-supervised way, in which visual similarities among the training samples over a specific feature space are extracted; and the feature map only records the most distinctive features shared among most of the training samples. A. Object Decomposition Object decomposition focuses on decomposing the targeted contextual objects into visually simple but semantically meaningful components. Such compositional approach enables us to construct a hierarchical knowledge model for the detecting object, which contains the visual semantic (i.e. visual grammar) about the detecting objects. In this work, the decomposition rule is derived from modelling the labelled intercity street scene image samples from MIT LabelME dataset [16]. B. Feature Extraction Shape is a robust feature against photometric variations. In this work, PB boundary detector [17] is used to extract boundary in the processing image. From the output of the PB boundary detection, we are able to accumulate the boundaries map to compute the histogram map of boundary distribution and use it to monitor the similarities of boundary orientation shared among individual object instances. Every point in the histogram map of boundary distribution is assigned with a value indicating the likelihood of detecting a boundary point at that location [14][14]. The map is used as the global level feature descriptor to generate recognition hypotheses; and it is also used as one of the local level descriptors to validate the previously generated hypotheses. The Histogram of Oriented Gradient (HOG) is another commonly used descriptor, which captures local feature appearances by analysing the intensity gradients distribution of the targeted areas of interest [18]. HOG is able to generate a good performance in capturing strong directional feature at localised regions. Following Felzenszwalkb's approach [19] we apply filter kernels and over an sub-region in the targeting gray-scale image using a sliding window approach. The gradient magnitude for each pixel is summarised into a one dimensional 9-bin histogram, each of which records the intensity of the gradient at that specific direction. Further normalisation is carried out to adjust the gradient histogram vector according to its surrounding windows. Finally, we accumulate the HOG distributions for all samples for a given component to derive the similarity in the oriented gradient for that component. The intensity distribution is tracked and monitored using gray-scale intensity histograms in a 32-bin feature vector. Due to the unstable intensity distribution at the global object level, the intensity analysis is only applied at the component level. The intensity similarity is calculated between the image patch and the model; then multiplied by the customised weighting to

3 compute the recognition confidence for the component, which consequently contributes to the overall object recognition score. C. Component Selection Since an object s sub-components are defined by the manual annotations provided in the sample image dataset, each component has its own distinctiveness and it therefore contributes differently towards the object recognition process. Principle Component Analysis (PCA) is used to extract a set of key principle components that dominate the object recognition. As explained in previous chapter, we take into consideration of three features for every component, i.e. boundary distribution, intensity distribution and histogram of oriented gradient (HOG), together with three of its relational information, i.e. occurrence frequency, relative location, and relative size. During the encoding process, each component is converted into a 55-bit feature vector (9-bit for boundary distribution, 32- bit for intensity histogram, 9-bit for HOG distribution, 1-bit for occurrence, 2-bit for relative location, and 2-bit for relative size). For each element in the feature vector, we calculate the difference between the element value, against its mean value ; then normalise the difference by dividing it with 2 standard deviation. We discard elements that reside outside the 2 standard deviation, to cover 95% of the sample data. The relevance score for each component is then calculated by averaging the individual normalised distances for each element in the feature vector, as shown in (1). Every vehicle sample is converted into a N-element vector, each of which represents the relevance score of a component. To balance the performance with the computation overhead, we decided to only select the top 6 components, which accumulatively contribute towards the recognition of the 77% of the samples, i.e. wheel, rim, window, tail light, head light, windshield. D. Component Recognition To achieve recognition for a targeted component, a set of similarity measures are performed over different feature spaces between the component candidate and the feature maps stored in the model. For each feature space, we convert the extracted feature into a 1-D vector space and measure its correlation against the mean vector. The boundary map of a given sample is divided into 8-by-8 pixel windows. Within each window, we compute the overall score for that window by dividing the total intensity energy with the total number of edge points in that window to generate a 1-D vector with elements. Similarly, for the histogram of oriented gradient (HOG), every window is represented in a 9-bin 1-D vector, each bin represent a direction. For colour intensity, we convert the intensity distribution for each of the R*G*B band into a 32-bin array. (1) For each feature for any given component, a mean vector is computed across the complete sample set. The normalised Euclidean distance metric, i.e. Mahalanobis distance, shown in (2), is calculated between every sample against the mean vector to measure how close the sample is against the centroid of the entire sample set. Then we compute the standard deviation of the Mahalanobis distances to represent the variation spread between individual samples to the mean and reverse standard deviation stated to convert it into a value between 0 to 1, as shown in (3), and use it as the weighting score in the processed feature space for that component. Successful recognition of an individual component contributes towards the recognition of the whole object. Assessment over the semantic relationship is also carried out between the component candidate and other identified components. Figure 2 shows the spatial relationship between filtered key individual components within the detecting object. The relative size matrix for each component is also monitored to ensure the recognitions for each individual component are consistent towards to the whole contextual object recognition. This mutual spatial map together with the relative size restriction significantly reduces the searching domain for the remaining components when one component is identified, therefore improve the detection efficiency significantly. Fig. 2. Boundary & HOG distribution for each Component (Headlight, Taillight, Wheel, Rim, Window, Shield). III. OBJECT RECOGNITION The object recognition process is divided into two stages: an approximation process, Hypothesis Generation, is first applied to quickly restrict the searching areas; then a more comprehensive matching, Hypothesis Validation, is performed to verify the generated hypotheses by recognising each (2) (3)

4 component and examine their inter-spatial relationships. Thresholding is applied to determine when recognition in a specific feature space is achieved. Standard deviation is computed between feature maps from individual samples and feature means stored in the object model. Thresholds are set dynamically, depending on whether the recognition is applied to the global object level, or local component level. For the hypothesis generation at a global object level, we set the threshold to be within 3 standard deviations away from the mean value to ensure we include the maximum number of true positive hypotheses. Whereas for the hypothesis validation, we set the threshold for each component at each feature space to be within 1 standard deviation from the mean, in order to filter out as many false positive hypotheses as possible, and therefore increase the recognition accuracy. A. Hypothesis Generation For the hypothesis generation, the exhausted sliding window searching is applied over a set of scales, scanning through the whole image to generate potential object hypotheses. The set of scales are pre-defined covering from 5% to 50% of the size of the processing image. Boundary detection is first applied to each processing image patch, to extract its boundary map, as shown in Figure 3. Each candidate window, shown in top row, is compared against the boundary distribution map of the global object stored in the object model, shown in bottom row. The boundary map is then segmented into 8x8 pixel windows, and each window is compared against the corresponding boundary window stored in the model. For every pixel point within a window of size at location, we measure the difference of the intensity between and, by summing up the boundary intensity difference for every point within the window and then divide it by total number of points in that window, as shown in (4). Fig. 3. Hypothesis Validation for Boundary Matching. The processing sample is converted into a 1-D vector, each element in the vector represents the boundary intensity difference for a corresponding window at location. A mean boundary vector is extracted in the same way from the model's boundary histogram map. The boundary distribution matching for the object is calculated using Mahalanobis distance between and, shown in (5). Every sample that pass the pre-defined matching threshold is considered to be a potential object candidates. Thus a set of candidates output from the hypothesis (4) generation is collected and passed into the hypothesis validation process to verify the recognition. B. Hypothesis Validation Hypothesis generation provides a set of locations, where there are potentially high probabilities of being identified as object instances. For all the hypotheses that passed the threshold during the hypothesis generation process, they are treated as potential object candidates, and decomposed into sub-regions according to the object model for further validation. Hypothesis validation examines the corresponding sub-regions of the extracted hypotheses and attempt to validate the hypotheses by identifying its essential sub-components according to the object model. The component recognition result is then consolidated together to validate the recognition of the whole object. The recognition at local component level is carried out in three feature spaces, boundary distribution ( ), histogram of oriented gradient ( ), and intensity histogram ( ). For the validation of boundary distribution for each component, it is similar like the boundary distribution matching at the global object level. For any particular component, we divide the boundary map into pixel windows and calculate the boundary distribution for each window at location. We then compute the boundary distribution matching between the processing image region and model, in the form of Mahalanobis distance (6). For the histogram of oriented gradient, we compute HOG feature for every pixel window for each component to generate a HOG map. Within each window, we use the direction with the highest gradient intensity to represent the gradient of the window, shown in (7); and the HOG matching between sample and model is calculated in Mahalanobis distance shown in (8). For intensity histogram, we convert the gray scale intensity map of the process image into a 32-bit histogram and compute the Mahalanobis distance between the processing image and the histogram in model, shown in (9). The final recognition for each component is the sum of the matching result from all three feature spaces multiplied by its corresponding weighting, shown in (10). Object hypotheses with validation scores that pass the threshold are considered to be the correct hypotheses. (5) (6) (7) (8) (9)

5 IV. EXPERIMENT (10) We compare the performance of the proposed method of semantic visual decomposition modelling (SVDM) against popular existing recognition methods: contour template matching (CTM) [15], top-down and bottom-up segment merging/splitting (TDBU) [9], and part-based deformable models (PBDM) Error! Reference source not found.. A. Dataset The training samples for model construction are extracted from street scene images in MIT LabelME dataset [16], an online dataset allowing customised annotations at component level. Object model is built on 40 training samples selected containing sufficient visual details for each annotated component. The recognition performance is evaluated using the MIT StreetScene dataset. The MIT StreetScene dataset contains professionally labelled and verified annotations at a contextual object level. We compared the recognition results against state-of-the-art methods and also against the manually annotation benchmark provided within MIT StreetScene dataset. B. Contour Template Matching (CTM) Contour template matching is a simple but classic recognition method based on matching the boundary orientation of an object candidate with the contour model. The distances between the centre point and the intersection point at a particular angle are measured between the object candidate and the contour model, stated in (11). Two intersection points are considered to be a matching pair if their distance differences within a pre-defined threshold and an object instance contain sufficient matching pairs identified to support the hypothesis. (11) In this work, we set to examine the contour matching with different numbers of distance pairs between the candidate window and the vehicle model; different thresholds in distance variation and different threshold for the number of matching. In general, the method has fast execution; however, the recognition is easily tempered by noise regions. For instance, foliage regions are often matched with any shape due to the large amount of evenly distributed noise edges generated due to the illumination changes. Increasing the number of distance pairs and thresholds is able to improve the recognition accuracy. The consequence is that the computation complexity also increases proportionally to the number of pairs involved in the recognition. C. Top-Down and Bottom-Up Matching (TDBU) Combination of top-down and bottom-up (TDBU) is a recognition method using template matching guided by the segmentation maps from the two extreme directions. Going through the coarse segmentation maps in a top-down approach is able to restrict the searching areas for the detecting object. Once the locations of the potential object hypothesis are identified, a bottom-up approach is applied to look into the hypothesis locations to validate the hypotheses by merging or splitting the segments in those locations with the guidance of the detailed segmentation maps. The targeted image is firstly over-segmented, and individual segments are recursively merged base on the colour and texture saliency against the adjacent segments. A hierarchy of segment maps can be generated from the merging order, with few distinctive segments on the top of the hierarch and over-segmented regions at the bottom. Going through the hierarchy from top to bottom, a set of hypotheses can be generated by matching the overlapping area between the template and the grouped segments in the hypothesis location. The main drawback of the TDBU method is that it does not cope well in recognising objects with complicated background. Since its performance is heavily influenced by the initial over-segmentation. In other words, the detecting object cannot be recognised if it cannot be separated from the surrounding segments in the merging decision tree. Furthermore, TDBU turns to be computational intensive when processing complicated real life images, in which the detecting objects are small comparing the background. The situation is worsened when the detecting objects is visually indistinguishable from the background areas. D. Part-Based Deformable Modelling (PBDM) The part-based deformable modelling method described in this paper is based on the work of Felzenszwalb [25] based on histogram of oriented gradient (HOG). Like other codebook based approaches, the parts-based deformable model is constructed using a loosely supervised approach by training the object model using labelled object samples; however, leaving the recognizable inner parts of the object to deform in a unsupervised way. Template model used by Felzenszwalkb is shown in Figure 15. Hypotheses are generated through a coarse matching at the root level and the hypotheses are further reinforced by a deformable parts matching, aiming to capture the detailed patterns that are not visible at the coarse level. PBDM performs the recognition at two levels. A quick object detection is carried out in a sliding window approach at coarse root level; then recognition for its deformable parts at a refined level. The object recognition result is the summation of the recognition score for each deformable part, computed by comparing the HOG feature map extracted for each image patch against the object model at scale. is the coarse level object model and are the individual deformable parts in the object model. In HOG vehicle model, the most distinctive features is concentrated around the wheel regions, while the other vehicles regions can be described as horizontal HOG features. Matching this HOG model against image patches is sufficient to robustly separate the vehicle instances from the rest regions even in such a challenging dataset with complicated

6 background. However the recognition performance reduces when processing images containing horizontal representation in the HOG feature space. Furthermore, Fellzenszwalb's method only requires marking the whole training samples with bounding boxes; and leave objects' inner components to "selfdeform" based on visual integrity. Those inner parts are only grouped based on visual similarity. They encapsulate limited semantic information and thus cannot be used to assist in filtering out semantics false position hypotheses. E. Semantic Visual Decomposition Modelling (SVDM) SVDM extracts and analyses features in the forms of the histogram of boundary, histogram of oriented gradient and intensity histogram. Instead of generating the inner components using an unsupervised deformable approach, SVDM encapsulates both the visual appearances and the spatial relationships of the inner components into the object model based on object annotations provided with the training dataset. In the recognition process, template matching is performed for each component, examining boundary, HOG, and intensity histogram at the specific locations based on the spatial map stored within the object model. Template matching is applied across the whole image using a sliding window approach to generate object hypotheses. Figure 4 (a) is a processing image sample, we analyse it through the hypotheses generation to output a hypotheses map recording where each hypothesis is located and its similarity matching, showing as the intensity in the map, against the object model, shown in Figure 4 (b). For each object hypothesis generated, recursive matching is applied to identify the components that form the object to validate the hypothesis, as shown in Figure 4 (c) to (g). The final recognition result is the combination of hypothesis generation result at the object level with the hypothesis validation result at the component level, shown in Figure 4 (h). Comparing with the PBDM, with a more restricted validation process deployed during the recognition, the SVDM approach is able to filter out false positives that are generated during the hypothesis generation process. The SVDM also experiences the side effect that troubles many component based recognition methods, but unlike other, the proposed method can recover from the effect of low thresholds since the validation stage can eliminate most of the false positives. Another type of misrecognitions (false positives) can be generated when recognitions at component level over-rules the recognition at the vehicle level. For example, the wheels and wheel rims from two different vehicles are extracted at their expected locations are matching with high confidence to form a vehicle object. So the SVDM starts to generate misrecognition by stitching components from different objects that match the recognition model. This can be eliminated by increasing the threshold in the hypothesis generation stage to minimise the number of candidate objects to be considered, and therefore restricting the object-level matching to allow for too many false positive objects. F. Method Evaluation Headings For the performance comparison across the different methods we used a subset from the MIT StreetScene dataset by randomly selecting 80 images containing 150 vehicle instances from the side view and mixing them together with 80 randomly selected street images with no vehicle present. The number of vehicle instances and the location for each vehicle instance is not restricted in the image set. The manual annotation of this subset of StreetScene images is considered by the research community as ground truth for recognition performance evaluation. To evaluate the performance of the recognition, we consider a vehicle instance to be identified correctly if the recognition result has an intersection ratio that is greater 90% with the manual annotation provided in the MIT StreetScene dataset. With the recognition validation at component level, thresholding is applied to determine if a key component is identified. As explained previously, the threshold is set to be the mean plus 1 standard deviation. Based on the results obtained from the experiments, the proposed SVDM method (see Figure 5 and Table I) outperformed both CTM and TDBU methods, and it is also able to deliver a matching performance against the PBDM method based on HOG feature. Like the a) Processing Image b) Vehicle Hypotheses Map c) Head Light Candidates Map d) Tail Light Candidates Map e) Wheel Candidates Map f) Rim Candidates Map g) Window Candidates Map h) PDBM Recognition Result Fig. 4. Semantic Virtual Decomposition Modelling (SVDM) process for validating the hypothesis.

7 PBDM method, SVDM has a high recall (97% and 95% respectively) and therefore retrieves most objects from the scene, but the proposed PBDM method has slightly better precision (61% against 59% for the PBDM) and therefore a higher F-measure, at 0.74, which is the highest against all other methods overall. In general, the proposed method works well with vehicle recognition due to its highly structural representation. With a tight threshold, the proposed SVDM is able to generate more accurate recognition result comparing to the PBDM method. However, when a loose threshold is applied, the SVDM does not handle as well as the PDBM method and it is easier to confuse vehicle instances with background patches that share similar visual patterns. TABLE I. Fig. 5. Performance Comparison. RECOGNITION RESULT COMPARISON Method Recognition Performance Precision Recall F-measure CTM 42% 88% TDBU 62% 73% PBDM 59% 97% SVDM 61% 95% V. ACKNOWLEDGMENT In conclusion, we proposed a method to automatically construct object models by analysing visual and semantic spatial characteristics of each object's compositional innerparts. The proposed method is built on the boundary and Histogram of Oriented Gradient (HOG) features. Comparison is carried out against existing benchmark recognition methods such as CTM, TDBU, and PBDM over the same dataset. The proposed SVDM method is able to generate an improvement in the recognition accuracy comparing with those popular detection methods, with the added benefit of having an automatic way of generating models for object detection. Further work will be focused in extending the SVDM to monitor objects from multiple classes. For example, SVDM can be extended to monitor scene recognition by using statistical analysis focusing mainly on the co-occurrence and spatial relationships between objects of difference classes instead of visual appearances of inner object components. Other features can also be considered for the object recognition, so that more false positives can be discarded. REFERENCES [1] J. Harel, C. Koch, et al, "Graph-based visual saliency", Proceedings of Neural Information Processing Systems, pp , 2006 [2] S. Bileschi and L. Wolf, "A Unified System For Object Detection, Texture Recognition, and Context Analysis Based on the Standard Model Feature Set", British Machine Vision Conference, [3] J.R. Smith, F.F. Chang, "VisualSEEK: a fully automated content-based image query system", Proceedings of ACM Multimedia, pp , [4] Y. Rui, T.S. Huang, et al, "Relevance feedback: a power tool in interactive content-based image retrieval", IEEE transaction in Circuits Systems Video Technol. Vol.8 (5), pp , [5] J. Vogel, and B. Schiele, "Semantic Modeling of Natural Scenes for Content-Based Image Retrieval", Journal of Computer Vision, Vol.72, pp , [6] L. Li, R. Socher, et al, "Towards Total Scene Understanding: Classification, Annotation and Segmentation in an Automatic Framework", the Joint VCL-ViSU workshop, [7] Oliva, A. and Torralba, A. Modeling the Shape of the Scene: A Holistic Representation of the Spatial Envelope, International Journal of Computer Vision, vol.42, vol.3, pp , [8] M. A. Grudin, "On internal representations in face recognition systems", Pattern Recognition, vol.33 (7), pp , [9] E. Borenstein, J. Malik, et al, "Combining Top-down and Bottom-up Segmentation", IEEE Conf. on Computer Vision and Pattern Recognition, [10] A. Oliva and A. Torralba, "Modeling the shape of the scene: A holistic representation of the spatial envelop", International Journal of Computer Vision, pp , [11] A. Torralba. "Comtextual priming for object detection", Internatioal Journal of Computer Vision, pp , [12] M. Boutell, A. Choudhury, Jiebo Luo, and C. M. Brown, "Using Semantic Features for Scene Classification: how Good do they Need to Be?", IEEE Intl. Conf. on Multimedia and Expo, pp , [13] M. R. Boutell, J. Luo, C. M. Brown, "Scene Parsing Using Region- Based Generative Models" IEEE Transactions on Multimedia, vol.9(1), pp , December2006 [14] G. Qin and B. Vrusias, "Adaptable Models and Semantic Filtering for Object Recognition in Street Images", Int. Conf. on Signal and Image Processing Applications, [15] G. Qin, B. Vrusias, and L. Gilliam, "Background Filtering for Improving of Object Detection in Images", International Conference on Pattern Recognition, [16] B. C. Russel, and A. Torralba, "LabelME: a database and web-based tool for image annotation", International Journal of Computer Vision, vol.77, pp , [17] David Martin, Charless Fowlkes, and Jitendra Malik, "Learning to detect natural image boundaries using local brightness, color, and texture cues", IEEE Trans. PAMI, vol 26, pp , [18] Dalal, N, "Histograms of oriented gradients for human detection", Computer Vision Pattern Recognition, IEEE Computer Society, [19] Felzenszwalb, P. F., Girshick, R. B., et al. "Object detection with discriminatively trained part-based models", IEEE transactions on pattern analysis and machine intelligence,vol.32, pp , 2010.

Deformable Part Models

Deformable Part Models CS 1674: Intro to Computer Vision Deformable Part Models Prof. Adriana Kovashka University of Pittsburgh November 9, 2016 Today: Object category detection Window-based approaches: Last time: Viola-Jones

More information

Selective Search for Object Recognition

Selective Search for Object Recognition Selective Search for Object Recognition Uijlings et al. Schuyler Smith Overview Introduction Object Recognition Selective Search Similarity Metrics Results Object Recognition Kitten Goal: Problem: Where

More information

Learning and Inferring Depth from Monocular Images. Jiyan Pan April 1, 2009

Learning and Inferring Depth from Monocular Images. Jiyan Pan April 1, 2009 Learning and Inferring Depth from Monocular Images Jiyan Pan April 1, 2009 Traditional ways of inferring depth Binocular disparity Structure from motion Defocus Given a single monocular image, how to infer

More information

Sketchable Histograms of Oriented Gradients for Object Detection

Sketchable Histograms of Oriented Gradients for Object Detection Sketchable Histograms of Oriented Gradients for Object Detection No Author Given No Institute Given Abstract. In this paper we investigate a new representation approach for visual object recognition. The

More information

Part based models for recognition. Kristen Grauman

Part based models for recognition. Kristen Grauman Part based models for recognition Kristen Grauman UT Austin Limitations of window-based models Not all objects are box-shaped Assuming specific 2d view of object Local components themselves do not necessarily

More information

Previously. Part-based and local feature models for generic object recognition. Bag-of-words model 4/20/2011

Previously. Part-based and local feature models for generic object recognition. Bag-of-words model 4/20/2011 Previously Part-based and local feature models for generic object recognition Wed, April 20 UT-Austin Discriminative classifiers Boosting Nearest neighbors Support vector machines Useful for object recognition

More information

CORRELATION BASED CAR NUMBER PLATE EXTRACTION SYSTEM

CORRELATION BASED CAR NUMBER PLATE EXTRACTION SYSTEM CORRELATION BASED CAR NUMBER PLATE EXTRACTION SYSTEM 1 PHYO THET KHIN, 2 LAI LAI WIN KYI 1,2 Department of Information Technology, Mandalay Technological University The Republic of the Union of Myanmar

More information

Mobile Human Detection Systems based on Sliding Windows Approach-A Review

Mobile Human Detection Systems based on Sliding Windows Approach-A Review Mobile Human Detection Systems based on Sliding Windows Approach-A Review Seminar: Mobile Human detection systems Njieutcheu Tassi cedrique Rovile Department of Computer Engineering University of Heidelberg

More information

Category vs. instance recognition

Category vs. instance recognition Category vs. instance recognition Category: Find all the people Find all the buildings Often within a single image Often sliding window Instance: Is this face James? Find this specific famous building

More information

Texture. Frequency Descriptors. Frequency Descriptors. Frequency Descriptors. Frequency Descriptors. Frequency Descriptors

Texture. Frequency Descriptors. Frequency Descriptors. Frequency Descriptors. Frequency Descriptors. Frequency Descriptors Texture The most fundamental question is: How can we measure texture, i.e., how can we quantitatively distinguish between different textures? Of course it is not enough to look at the intensity of individual

More information

Object Category Detection. Slides mostly from Derek Hoiem

Object Category Detection. Slides mostly from Derek Hoiem Object Category Detection Slides mostly from Derek Hoiem Today s class: Object Category Detection Overview of object category detection Statistical template matching with sliding window Part-based Models

More information

Combining Top-down and Bottom-up Segmentation

Combining Top-down and Bottom-up Segmentation Combining Top-down and Bottom-up Segmentation Authors: Eran Borenstein, Eitan Sharon, Shimon Ullman Presenter: Collin McCarthy Introduction Goal Separate object from background Problems Inaccuracies Top-down

More information

Classification. Vladimir Curic. Centre for Image Analysis Swedish University of Agricultural Sciences Uppsala University

Classification. Vladimir Curic. Centre for Image Analysis Swedish University of Agricultural Sciences Uppsala University Classification Vladimir Curic Centre for Image Analysis Swedish University of Agricultural Sciences Uppsala University Outline An overview on classification Basics of classification How to choose appropriate

More information

Detection III: Analyzing and Debugging Detection Methods

Detection III: Analyzing and Debugging Detection Methods CS 1699: Intro to Computer Vision Detection III: Analyzing and Debugging Detection Methods Prof. Adriana Kovashka University of Pittsburgh November 17, 2015 Today Review: Deformable part models How can

More information

Large-Scale Traffic Sign Recognition based on Local Features and Color Segmentation

Large-Scale Traffic Sign Recognition based on Local Features and Color Segmentation Large-Scale Traffic Sign Recognition based on Local Features and Color Segmentation M. Blauth, E. Kraft, F. Hirschenberger, M. Böhm Fraunhofer Institute for Industrial Mathematics, Fraunhofer-Platz 1,

More information

Improving the Efficiency of Fast Using Semantic Similarity Algorithm

Improving the Efficiency of Fast Using Semantic Similarity Algorithm International Journal of Scientific and Research Publications, Volume 4, Issue 1, January 2014 1 Improving the Efficiency of Fast Using Semantic Similarity Algorithm D.KARTHIKA 1, S. DIVAKAR 2 Final year

More information

Video shot segmentation using late fusion technique

Video shot segmentation using late fusion technique Video shot segmentation using late fusion technique by C. Krishna Mohan, N. Dhananjaya, B.Yegnanarayana in Proc. Seventh International Conference on Machine Learning and Applications, 2008, San Diego,

More information

OCCLUSION BOUNDARIES ESTIMATION FROM A HIGH-RESOLUTION SAR IMAGE

OCCLUSION BOUNDARIES ESTIMATION FROM A HIGH-RESOLUTION SAR IMAGE OCCLUSION BOUNDARIES ESTIMATION FROM A HIGH-RESOLUTION SAR IMAGE Wenju He, Marc Jäger, and Olaf Hellwich Berlin University of Technology FR3-1, Franklinstr. 28, 10587 Berlin, Germany {wenjuhe, jaeger,

More information

Analysis of Image and Video Using Color, Texture and Shape Features for Object Identification

Analysis of Image and Video Using Color, Texture and Shape Features for Object Identification IOSR Journal of Computer Engineering (IOSR-JCE) e-issn: 2278-0661,p-ISSN: 2278-8727, Volume 16, Issue 6, Ver. VI (Nov Dec. 2014), PP 29-33 Analysis of Image and Video Using Color, Texture and Shape Features

More information

Multiple-Person Tracking by Detection

Multiple-Person Tracking by Detection http://excel.fit.vutbr.cz Multiple-Person Tracking by Detection Jakub Vojvoda* Abstract Detection and tracking of multiple person is challenging problem mainly due to complexity of scene and large intra-class

More information

Feature descriptors. Alain Pagani Prof. Didier Stricker. Computer Vision: Object and People Tracking

Feature descriptors. Alain Pagani Prof. Didier Stricker. Computer Vision: Object and People Tracking Feature descriptors Alain Pagani Prof. Didier Stricker Computer Vision: Object and People Tracking 1 Overview Previous lectures: Feature extraction Today: Gradiant/edge Points (Kanade-Tomasi + Harris)

More information

Category-level localization

Category-level localization Category-level localization Cordelia Schmid Recognition Classification Object present/absent in an image Often presence of a significant amount of background clutter Localization / Detection Localize object

More information

Classification of objects from Video Data (Group 30)

Classification of objects from Video Data (Group 30) Classification of objects from Video Data (Group 30) Sheallika Singh 12665 Vibhuti Mahajan 12792 Aahitagni Mukherjee 12001 M Arvind 12385 1 Motivation Video surveillance has been employed for a long time

More information

An Introduction to Content Based Image Retrieval

An Introduction to Content Based Image Retrieval CHAPTER -1 An Introduction to Content Based Image Retrieval 1.1 Introduction With the advancement in internet and multimedia technologies, a huge amount of multimedia data in the form of audio, video and

More information

Detecting Salient Contours Using Orientation Energy Distribution. Part I: Thresholding Based on. Response Distribution

Detecting Salient Contours Using Orientation Energy Distribution. Part I: Thresholding Based on. Response Distribution Detecting Salient Contours Using Orientation Energy Distribution The Problem: How Does the Visual System Detect Salient Contours? CPSC 636 Slide12, Spring 212 Yoonsuck Choe Co-work with S. Sarma and H.-C.

More information

Consistent Line Clusters for Building Recognition in CBIR

Consistent Line Clusters for Building Recognition in CBIR Consistent Line Clusters for Building Recognition in CBIR Yi Li and Linda G. Shapiro Department of Computer Science and Engineering University of Washington Seattle, WA 98195-250 shapiro,yi @cs.washington.edu

More information

A Keypoint Descriptor Inspired by Retinal Computation

A Keypoint Descriptor Inspired by Retinal Computation A Keypoint Descriptor Inspired by Retinal Computation Bongsoo Suh, Sungjoon Choi, Han Lee Stanford University {bssuh,sungjoonchoi,hanlee}@stanford.edu Abstract. The main goal of our project is to implement

More information

Content Based Image Retrieval

Content Based Image Retrieval Content Based Image Retrieval R. Venkatesh Babu Outline What is CBIR Approaches Features for content based image retrieval Global Local Hybrid Similarity measure Trtaditional Image Retrieval Traditional

More information

Segmentation as Selective Search for Object Recognition in ILSVRC2011

Segmentation as Selective Search for Object Recognition in ILSVRC2011 Segmentation as Selective Search for Object Recognition in ILSVRC2011 Koen van de Sande Jasper Uijlings Arnold Smeulders Theo Gevers Nicu Sebe Cees Snoek University of Amsterdam, University of Trento ILSVRC2011

More information

A Hierarchial Model for Visual Perception

A Hierarchial Model for Visual Perception A Hierarchial Model for Visual Perception Bolei Zhou 1 and Liqing Zhang 2 1 MOE-Microsoft Laboratory for Intelligent Computing and Intelligent Systems, and Department of Biomedical Engineering, Shanghai

More information

Boundaries and Sketches

Boundaries and Sketches Boundaries and Sketches Szeliski 4.2 Computer Vision James Hays Many slides from Michael Maire, Jitendra Malek Today s lecture Segmentation vs Boundary Detection Why boundaries / Grouping? Recap: Canny

More information

Part-based and local feature models for generic object recognition

Part-based and local feature models for generic object recognition Part-based and local feature models for generic object recognition May 28 th, 2015 Yong Jae Lee UC Davis Announcements PS2 grades up on SmartSite PS2 stats: Mean: 80.15 Standard Dev: 22.77 Vote on piazza

More information

Object Category Detection: Sliding Windows

Object Category Detection: Sliding Windows 04/10/12 Object Category Detection: Sliding Windows Computer Vision CS 543 / ECE 549 University of Illinois Derek Hoiem Today s class: Object Category Detection Overview of object category detection Statistical

More information

Human detection using histogram of oriented gradients. Srikumar Ramalingam School of Computing University of Utah

Human detection using histogram of oriented gradients. Srikumar Ramalingam School of Computing University of Utah Human detection using histogram of oriented gradients Srikumar Ramalingam School of Computing University of Utah Reference Navneet Dalal and Bill Triggs, Histograms of Oriented Gradients for Human Detection,

More information

Detecting Printed and Handwritten Partial Copies of Line Drawings Embedded in Complex Backgrounds

Detecting Printed and Handwritten Partial Copies of Line Drawings Embedded in Complex Backgrounds 9 1th International Conference on Document Analysis and Recognition Detecting Printed and Handwritten Partial Copies of Line Drawings Embedded in Complex Backgrounds Weihan Sun, Koichi Kise Graduate School

More information

TRANSPARENT OBJECT DETECTION USING REGIONS WITH CONVOLUTIONAL NEURAL NETWORK

TRANSPARENT OBJECT DETECTION USING REGIONS WITH CONVOLUTIONAL NEURAL NETWORK TRANSPARENT OBJECT DETECTION USING REGIONS WITH CONVOLUTIONAL NEURAL NETWORK 1 Po-Jen Lai ( 賴柏任 ), 2 Chiou-Shann Fuh ( 傅楸善 ) 1 Dept. of Electrical Engineering, National Taiwan University, Taiwan 2 Dept.

More information

BSB663 Image Processing Pinar Duygulu. Slides are adapted from Selim Aksoy

BSB663 Image Processing Pinar Duygulu. Slides are adapted from Selim Aksoy BSB663 Image Processing Pinar Duygulu Slides are adapted from Selim Aksoy Image matching Image matching is a fundamental aspect of many problems in computer vision. Object or scene recognition Solving

More information

Object and Action Detection from a Single Example

Object and Action Detection from a Single Example Object and Action Detection from a Single Example Peyman Milanfar* EE Department University of California, Santa Cruz *Joint work with Hae Jong Seo AFOSR Program Review, June 4-5, 29 Take a look at this:

More information

Analysis: TextonBoost and Semantic Texton Forests. Daniel Munoz Februrary 9, 2009

Analysis: TextonBoost and Semantic Texton Forests. Daniel Munoz Februrary 9, 2009 Analysis: TextonBoost and Semantic Texton Forests Daniel Munoz 16-721 Februrary 9, 2009 Papers [shotton-eccv-06] J. Shotton, J. Winn, C. Rother, A. Criminisi, TextonBoost: Joint Appearance, Shape and Context

More information

Multimodal Information Spaces for Content-based Image Retrieval

Multimodal Information Spaces for Content-based Image Retrieval Research Proposal Multimodal Information Spaces for Content-based Image Retrieval Abstract Currently, image retrieval by content is a research problem of great interest in academia and the industry, due

More information

Image Segmentation for Image Object Extraction

Image Segmentation for Image Object Extraction Image Segmentation for Image Object Extraction Rohit Kamble, Keshav Kaul # Computer Department, Vishwakarma Institute of Information Technology, Pune kamble.rohit@hotmail.com, kaul.keshav@gmail.com ABSTRACT

More information

Beyond Bags of features Spatial information & Shape models

Beyond Bags of features Spatial information & Shape models Beyond Bags of features Spatial information & Shape models Jana Kosecka Many slides adapted from S. Lazebnik, FeiFei Li, Rob Fergus, and Antonio Torralba Detection, recognition (so far )! Bags of features

More information

Tri-modal Human Body Segmentation

Tri-modal Human Body Segmentation Tri-modal Human Body Segmentation Master of Science Thesis Cristina Palmero Cantariño Advisor: Sergio Escalera Guerrero February 6, 2014 Outline 1 Introduction 2 Tri-modal dataset 3 Proposed baseline 4

More information

Find that! Visual Object Detection Primer

Find that! Visual Object Detection Primer Find that! Visual Object Detection Primer SkTech/MIT Innovation Workshop August 16, 2012 Dr. Tomasz Malisiewicz tomasz@csail.mit.edu Find that! Your Goals...imagine one such system that drives information

More information

Features Points. Andrea Torsello DAIS Università Ca Foscari via Torino 155, Mestre (VE)

Features Points. Andrea Torsello DAIS Università Ca Foscari via Torino 155, Mestre (VE) Features Points Andrea Torsello DAIS Università Ca Foscari via Torino 155, 30172 Mestre (VE) Finding Corners Edge detectors perform poorly at corners. Corners provide repeatable points for matching, so

More information

Using the Kolmogorov-Smirnov Test for Image Segmentation

Using the Kolmogorov-Smirnov Test for Image Segmentation Using the Kolmogorov-Smirnov Test for Image Segmentation Yong Jae Lee CS395T Computational Statistics Final Project Report May 6th, 2009 I. INTRODUCTION Image segmentation is a fundamental task in computer

More information

Effects Of Shadow On Canny Edge Detection through a camera

Effects Of Shadow On Canny Edge Detection through a camera 1523 Effects Of Shadow On Canny Edge Detection through a camera Srajit Mehrotra Shadow causes errors in computer vision as it is difficult to detect objects that are under the influence of shadows. Shadow

More information

Local Feature Detectors

Local Feature Detectors Local Feature Detectors Selim Aksoy Department of Computer Engineering Bilkent University saksoy@cs.bilkent.edu.tr Slides adapted from Cordelia Schmid and David Lowe, CVPR 2003 Tutorial, Matthew Brown,

More information

Image enhancement for face recognition using color segmentation and Edge detection algorithm

Image enhancement for face recognition using color segmentation and Edge detection algorithm Image enhancement for face recognition using color segmentation and Edge detection algorithm 1 Dr. K Perumal and 2 N Saravana Perumal 1 Computer Centre, Madurai Kamaraj University, Madurai-625021, Tamilnadu,

More information

Bipartite Graph Partitioning and Content-based Image Clustering

Bipartite Graph Partitioning and Content-based Image Clustering Bipartite Graph Partitioning and Content-based Image Clustering Guoping Qiu School of Computer Science The University of Nottingham qiu @ cs.nott.ac.uk Abstract This paper presents a method to model the

More information

Supervised texture detection in images

Supervised texture detection in images Supervised texture detection in images Branislav Mičušík and Allan Hanbury Pattern Recognition and Image Processing Group, Institute of Computer Aided Automation, Vienna University of Technology Favoritenstraße

More information

CS 534: Computer Vision Segmentation and Perceptual Grouping

CS 534: Computer Vision Segmentation and Perceptual Grouping CS 534: Computer Vision Segmentation and Perceptual Grouping Ahmed Elgammal Dept of Computer Science CS 534 Segmentation - 1 Outlines Mid-level vision What is segmentation Perceptual Grouping Segmentation

More information

Object detection using non-redundant local Binary Patterns

Object detection using non-redundant local Binary Patterns University of Wollongong Research Online Faculty of Informatics - Papers (Archive) Faculty of Engineering and Information Sciences 2010 Object detection using non-redundant local Binary Patterns Duc Thanh

More information

[2006] IEEE. Reprinted, with permission, from [Wenjing Jia, Huaifeng Zhang, Xiangjian He, and Qiang Wu, A Comparison on Histogram Based Image

[2006] IEEE. Reprinted, with permission, from [Wenjing Jia, Huaifeng Zhang, Xiangjian He, and Qiang Wu, A Comparison on Histogram Based Image [6] IEEE. Reprinted, with permission, from [Wenjing Jia, Huaifeng Zhang, Xiangjian He, and Qiang Wu, A Comparison on Histogram Based Image Matching Methods, Video and Signal Based Surveillance, 6. AVSS

More information

ELEC Dr Reji Mathew Electrical Engineering UNSW

ELEC Dr Reji Mathew Electrical Engineering UNSW ELEC 4622 Dr Reji Mathew Electrical Engineering UNSW Review of Motion Modelling and Estimation Introduction to Motion Modelling & Estimation Forward Motion Backward Motion Block Motion Estimation Motion

More information

2D Image Processing Feature Descriptors

2D Image Processing Feature Descriptors 2D Image Processing Feature Descriptors Prof. Didier Stricker Kaiserlautern University http://ags.cs.uni-kl.de/ DFKI Deutsches Forschungszentrum für Künstliche Intelligenz http://av.dfki.de 1 Overview

More information

12/12 A Chinese Words Detection Method in Camera Based Images Qingmin Chen, Yi Zhou, Kai Chen, Li Song, Xiaokang Yang Institute of Image Communication

12/12 A Chinese Words Detection Method in Camera Based Images Qingmin Chen, Yi Zhou, Kai Chen, Li Song, Xiaokang Yang Institute of Image Communication A Chinese Words Detection Method in Camera Based Images Qingmin Chen, Yi Zhou, Kai Chen, Li Song, Xiaokang Yang Institute of Image Communication and Information Processing, Shanghai Key Laboratory Shanghai

More information

Journal of Asian Scientific Research FEATURES COMPOSITION FOR PROFICIENT AND REAL TIME RETRIEVAL IN CBIR SYSTEM. Tohid Sedghi

Journal of Asian Scientific Research FEATURES COMPOSITION FOR PROFICIENT AND REAL TIME RETRIEVAL IN CBIR SYSTEM. Tohid Sedghi Journal of Asian Scientific Research, 013, 3(1):68-74 Journal of Asian Scientific Research journal homepage: http://aessweb.com/journal-detail.php?id=5003 FEATURES COMPOSTON FOR PROFCENT AND REAL TME RETREVAL

More information

Can Similar Scenes help Surface Layout Estimation?

Can Similar Scenes help Surface Layout Estimation? Can Similar Scenes help Surface Layout Estimation? Santosh K. Divvala, Alexei A. Efros, Martial Hebert Robotics Institute, Carnegie Mellon University. {santosh,efros,hebert}@cs.cmu.edu Abstract We describe

More information

Histograms of Oriented Gradients

Histograms of Oriented Gradients Histograms of Oriented Gradients Carlo Tomasi September 18, 2017 A useful question to ask of an image is whether it contains one or more instances of a certain object: a person, a face, a car, and so forth.

More information

A Novel Algorithm for Color Image matching using Wavelet-SIFT

A Novel Algorithm for Color Image matching using Wavelet-SIFT International Journal of Scientific and Research Publications, Volume 5, Issue 1, January 2015 1 A Novel Algorithm for Color Image matching using Wavelet-SIFT Mupuri Prasanth Babu *, P. Ravi Shankar **

More information

Robust PDF Table Locator

Robust PDF Table Locator Robust PDF Table Locator December 17, 2016 1 Introduction Data scientists rely on an abundance of tabular data stored in easy-to-machine-read formats like.csv files. Unfortunately, most government records

More information

Face and Nose Detection in Digital Images using Local Binary Patterns

Face and Nose Detection in Digital Images using Local Binary Patterns Face and Nose Detection in Digital Images using Local Binary Patterns Stanko Kružić Post-graduate student University of Split, Faculty of Electrical Engineering, Mechanical Engineering and Naval Architecture

More information

ELL 788 Computational Perception & Cognition July November 2015

ELL 788 Computational Perception & Cognition July November 2015 ELL 788 Computational Perception & Cognition July November 2015 Module 6 Role of context in object detection Objects and cognition Ambiguous objects Unfavorable viewing condition Context helps in object

More information

AN EFFICIENT BATIK IMAGE RETRIEVAL SYSTEM BASED ON COLOR AND TEXTURE FEATURES

AN EFFICIENT BATIK IMAGE RETRIEVAL SYSTEM BASED ON COLOR AND TEXTURE FEATURES AN EFFICIENT BATIK IMAGE RETRIEVAL SYSTEM BASED ON COLOR AND TEXTURE FEATURES 1 RIMA TRI WAHYUNINGRUM, 2 INDAH AGUSTIEN SIRADJUDDIN 1, 2 Department of Informatics Engineering, University of Trunojoyo Madura,

More information

Ensemble of Bayesian Filters for Loop Closure Detection

Ensemble of Bayesian Filters for Loop Closure Detection Ensemble of Bayesian Filters for Loop Closure Detection Mohammad Omar Salameh, Azizi Abdullah, Shahnorbanun Sahran Pattern Recognition Research Group Center for Artificial Intelligence Faculty of Information

More information

Deep Tracking: Biologically Inspired Tracking with Deep Convolutional Networks

Deep Tracking: Biologically Inspired Tracking with Deep Convolutional Networks Deep Tracking: Biologically Inspired Tracking with Deep Convolutional Networks Si Chen The George Washington University sichen@gwmail.gwu.edu Meera Hahn Emory University mhahn7@emory.edu Mentor: Afshin

More information

Figure 1: Workflow of object-based classification

Figure 1: Workflow of object-based classification Technical Specifications Object Analyst Object Analyst is an add-on package for Geomatica that provides tools for segmentation, classification, and feature extraction. Object Analyst includes an all-in-one

More information

A Content Based Image Retrieval System Based on Color Features

A Content Based Image Retrieval System Based on Color Features A Content Based Image Retrieval System Based on Features Irena Valova, University of Rousse Angel Kanchev, Department of Computer Systems and Technologies, Rousse, Bulgaria, Irena@ecs.ru.acad.bg Boris

More information

Identifying Layout Classes for Mathematical Symbols Using Layout Context

Identifying Layout Classes for Mathematical Symbols Using Layout Context Rochester Institute of Technology RIT Scholar Works Articles 2009 Identifying Layout Classes for Mathematical Symbols Using Layout Context Ling Ouyang Rochester Institute of Technology Richard Zanibbi

More information

Holistic Correlation of Color Models, Color Features and Distance Metrics on Content-Based Image Retrieval

Holistic Correlation of Color Models, Color Features and Distance Metrics on Content-Based Image Retrieval Holistic Correlation of Color Models, Color Features and Distance Metrics on Content-Based Image Retrieval Swapnil Saurav 1, Prajakta Belsare 2, Siddhartha Sarkar 3 1Researcher, Abhidheya Labs and Knowledge

More information

Advanced Video Content Analysis and Video Compression (5LSH0), Module 4

Advanced Video Content Analysis and Video Compression (5LSH0), Module 4 Advanced Video Content Analysis and Video Compression (5LSH0), Module 4 Visual feature extraction Part I: Color and texture analysis Sveta Zinger Video Coding and Architectures Research group, TU/e ( s.zinger@tue.nl

More information

arxiv: v1 [cs.cv] 31 Mar 2016

arxiv: v1 [cs.cv] 31 Mar 2016 Object Boundary Guided Semantic Segmentation Qin Huang, Chunyang Xia, Wenchao Zheng, Yuhang Song, Hao Xu and C.-C. Jay Kuo arxiv:1603.09742v1 [cs.cv] 31 Mar 2016 University of Southern California Abstract.

More information

Textural Features for Image Database Retrieval

Textural Features for Image Database Retrieval Textural Features for Image Database Retrieval Selim Aksoy and Robert M. Haralick Intelligent Systems Laboratory Department of Electrical Engineering University of Washington Seattle, WA 98195-2500 {aksoy,haralick}@@isl.ee.washington.edu

More information

Discovering Visual Hierarchy through Unsupervised Learning Haider Razvi

Discovering Visual Hierarchy through Unsupervised Learning Haider Razvi Discovering Visual Hierarchy through Unsupervised Learning Haider Razvi hrazvi@stanford.edu 1 Introduction: We present a method for discovering visual hierarchy in a set of images. Automatically grouping

More information

SIFT - scale-invariant feature transform Konrad Schindler

SIFT - scale-invariant feature transform Konrad Schindler SIFT - scale-invariant feature transform Konrad Schindler Institute of Geodesy and Photogrammetry Invariant interest points Goal match points between images with very different scale, orientation, projective

More information

Seeing and Reading Red: Hue and Color-word Correlation in Images and Attendant Text on the WWW

Seeing and Reading Red: Hue and Color-word Correlation in Images and Attendant Text on the WWW Seeing and Reading Red: Hue and Color-word Correlation in Images and Attendant Text on the WWW Shawn Newsam School of Engineering University of California at Merced Merced, CA 9534 snewsam@ucmerced.edu

More information

Evaluation and comparison of interest points/regions

Evaluation and comparison of interest points/regions Introduction Evaluation and comparison of interest points/regions Quantitative evaluation of interest point/region detectors points / regions at the same relative location and area Repeatability rate :

More information

An Object Detection Algorithm based on Deformable Part Models with Bing Features Chunwei Li1, a and Youjun Bu1, b

An Object Detection Algorithm based on Deformable Part Models with Bing Features Chunwei Li1, a and Youjun Bu1, b 5th International Conference on Advanced Materials and Computer Science (ICAMCS 2016) An Object Detection Algorithm based on Deformable Part Models with Bing Features Chunwei Li1, a and Youjun Bu1, b 1

More information

Computer Vision. Recap: Smoothing with a Gaussian. Recap: Effect of σ on derivatives. Computer Science Tripos Part II. Dr Christopher Town

Computer Vision. Recap: Smoothing with a Gaussian. Recap: Effect of σ on derivatives. Computer Science Tripos Part II. Dr Christopher Town Recap: Smoothing with a Gaussian Computer Vision Computer Science Tripos Part II Dr Christopher Town Recall: parameter σ is the scale / width / spread of the Gaussian kernel, and controls the amount of

More information

Linear combinations of simple classifiers for the PASCAL challenge

Linear combinations of simple classifiers for the PASCAL challenge Linear combinations of simple classifiers for the PASCAL challenge Nik A. Melchior and David Lee 16 721 Advanced Perception The Robotics Institute Carnegie Mellon University Email: melchior@cmu.edu, dlee1@andrew.cmu.edu

More information

An Implementation on Histogram of Oriented Gradients for Human Detection

An Implementation on Histogram of Oriented Gradients for Human Detection An Implementation on Histogram of Oriented Gradients for Human Detection Cansın Yıldız Dept. of Computer Engineering Bilkent University Ankara,Turkey cansin@cs.bilkent.edu.tr Abstract I implemented a Histogram

More information

2/15/2009. Part-Based Models. Andrew Harp. Part Based Models. Detect object from physical arrangement of individual features

2/15/2009. Part-Based Models. Andrew Harp. Part Based Models. Detect object from physical arrangement of individual features Part-Based Models Andrew Harp Part Based Models Detect object from physical arrangement of individual features 1 Implementation Based on the Simple Parts and Structure Object Detector by R. Fergus Allows

More information

DIGITAL IMAGE ANALYSIS. Image Classification: Object-based Classification

DIGITAL IMAGE ANALYSIS. Image Classification: Object-based Classification DIGITAL IMAGE ANALYSIS Image Classification: Object-based Classification Image classification Quantitative analysis used to automate the identification of features Spectral pattern recognition Unsupervised

More information

Robotics Programming Laboratory

Robotics Programming Laboratory Chair of Software Engineering Robotics Programming Laboratory Bertrand Meyer Jiwon Shin Lecture 8: Robot Perception Perception http://pascallin.ecs.soton.ac.uk/challenges/voc/databases.html#caltech car

More information

INTERNATIONAL JOURNAL OF PURE AND APPLIED RESEARCH IN ENGINEERING AND TECHNOLOGY

INTERNATIONAL JOURNAL OF PURE AND APPLIED RESEARCH IN ENGINEERING AND TECHNOLOGY INTERNATIONAL JOURNAL OF PURE AND APPLIED RESEARCH IN ENGINEERING AND TECHNOLOGY A PATH FOR HORIZING YOUR INNOVATIVE WORK REVIEW ON CONTENT BASED IMAGE RETRIEVAL BY USING VISUAL SEARCH RANKING MS. PRAGATI

More information

Segmentation of Images

Segmentation of Images Segmentation of Images SEGMENTATION If an image has been preprocessed appropriately to remove noise and artifacts, segmentation is often the key step in interpreting the image. Image segmentation is a

More information

Very Fast Image Retrieval

Very Fast Image Retrieval Very Fast Image Retrieval Diogo André da Silva Romão Abstract Nowadays, multimedia databases are used on several areas. They can be used at home, on entertainment systems or even in professional context

More information

https://en.wikipedia.org/wiki/the_dress Recap: Viola-Jones sliding window detector Fast detection through two mechanisms Quickly eliminate unlikely windows Use features that are fast to compute Viola

More information

Detecting Digital Image Forgeries By Multi-illuminant Estimators

Detecting Digital Image Forgeries By Multi-illuminant Estimators Research Paper Volume 2 Issue 8 April 2015 International Journal of Informative & Futuristic Research ISSN (Online): 2347-1697 Detecting Digital Image Forgeries By Multi-illuminant Estimators Paper ID

More information

Improving Latent Fingerprint Matching Performance by Orientation Field Estimation using Localized Dictionaries

Improving Latent Fingerprint Matching Performance by Orientation Field Estimation using Localized Dictionaries Available Online at www.ijcsmc.com International Journal of Computer Science and Mobile Computing A Monthly Journal of Computer Science and Information Technology IJCSMC, Vol. 3, Issue. 11, November 2014,

More information

Content Based Image Retrieval Using Color Quantizes, EDBTC and LBP Features

Content Based Image Retrieval Using Color Quantizes, EDBTC and LBP Features Content Based Image Retrieval Using Color Quantizes, EDBTC and LBP Features 1 Kum Sharanamma, 2 Krishnapriya Sharma 1,2 SIR MVIT Abstract- To describe the image features the Local binary pattern (LBP)

More information

Edge Detection. Computer Vision Shiv Ram Dubey, IIIT Sri City

Edge Detection. Computer Vision Shiv Ram Dubey, IIIT Sri City Edge Detection Computer Vision Shiv Ram Dubey, IIIT Sri City Previous two classes: Image Filtering Spatial domain Smoothing, sharpening, measuring texture * = FFT FFT Inverse FFT = Frequency domain Denoising,

More information

Segmentation and Tracking of Partial Planar Templates

Segmentation and Tracking of Partial Planar Templates Segmentation and Tracking of Partial Planar Templates Abdelsalam Masoud William Hoff Colorado School of Mines Colorado School of Mines Golden, CO 800 Golden, CO 800 amasoud@mines.edu whoff@mines.edu Abstract

More information

Object detection using Region Proposals (RCNN) Ernest Cheung COMP Presentation

Object detection using Region Proposals (RCNN) Ernest Cheung COMP Presentation Object detection using Region Proposals (RCNN) Ernest Cheung COMP790-125 Presentation 1 2 Problem to solve Object detection Input: Image Output: Bounding box of the object 3 Object detection using CNN

More information

Quasi-thematic Features Detection & Tracking. Future Rover Long-Distance Autonomous Navigation

Quasi-thematic Features Detection & Tracking. Future Rover Long-Distance Autonomous Navigation Quasi-thematic Feature Detection And Tracking For Future Rover Long-Distance Autonomous Navigation Authors: Affan Shaukat, Conrad Spiteri, Yang Gao, Said Al-Milli, and Abhinav Bajpai Surrey Space Centre,

More information

Using the Deformable Part Model with Autoencoded Feature Descriptors for Object Detection

Using the Deformable Part Model with Autoencoded Feature Descriptors for Object Detection Using the Deformable Part Model with Autoencoded Feature Descriptors for Object Detection Hyunghoon Cho and David Wu December 10, 2010 1 Introduction Given its performance in recent years' PASCAL Visual

More information

Definition, Detection, and Evaluation of Meeting Events in Airport Surveillance Videos

Definition, Detection, and Evaluation of Meeting Events in Airport Surveillance Videos Definition, Detection, and Evaluation of Meeting Events in Airport Surveillance Videos Sung Chun Lee, Chang Huang, and Ram Nevatia University of Southern California, Los Angeles, CA 90089, USA sungchun@usc.edu,

More information

Towards the completion of assignment 1

Towards the completion of assignment 1 Towards the completion of assignment 1 What to do for calibration What to do for point matching What to do for tracking What to do for GUI COMPSCI 773 Feature Point Detection Why study feature point detection?

More information

Performance Evaluation Metrics and Statistics for Positional Tracker Evaluation

Performance Evaluation Metrics and Statistics for Positional Tracker Evaluation Performance Evaluation Metrics and Statistics for Positional Tracker Evaluation Chris J. Needham and Roger D. Boyle School of Computing, The University of Leeds, Leeds, LS2 9JT, UK {chrisn,roger}@comp.leeds.ac.uk

More information